Introduction
Goals:
- Add an agent to Minecraft that performs as a player
- Evaluate the feasibility of having an automated Minecraft companion
- Assist with building tasks
- Assist with gathering materials
- Assist with fighting off mobs
- Experiment with custom trained and generic models
Mindcraft
Setup and Configuration
- Setup a 1.21.6 server using Crafty Controller
- Cloned the repository from Github
- Installed the required modeuls with
npm install - Updated the
andy.jsonconfiguration file to point to various hosts and models
andy.json:
{
"name": "clem",
"model": {
"api": "ollama",
"model": "ollama/sweaterdog/andy-4:q8_0",
"url": "http://banana:11434"
},
"embedding": {
"api": "ollama",
"model": "ollama/embeddinggemma",
"url": "http://star:11434"
},
"code_model": {
"api": "ollama",
"model": "ollama/qwen3-coder-next:latest",
"url": "http://star:11434"
},
"vision_model": {
"api": "ollama",
"model": "ollama/qwen3.5:latest",
"url": "http://banana:11434"
}
}Results
- Code execution regularly threw exceptions in the console output
- Downgrading to node version 20.20.1 with
nvmsolved the exceptions that were thrown when executing code
- Downgrading to node version 20.20.1 with
- Updating to reasoning prompts from the bots/andy-4-reasoning.json example results in longer response times
- Qwen models
- Qwen-3.5:35b was problematic after the first few prompts were sent to the server. I’ve observed this behavior with Qwen 3.5 before but have not tracked down the issue.
- Qwen-3.5:latest is also problematic when set as the model prompt and running on Ollama with an Nvidia 4070TI.
- gpt-oss:120b worked decently for coding
- Ministral models
- I’m trying ministral-3:8b next as a moderate sized reasoning model
Mindcraft-CE
Mindcraft-CE is a fork of Mindcraft with updates and additional functionality added.
Configuration
andy.json:
{
"name": "pom",
"model": {
"api": "openai",
"model": "andy-4.1",
"url": "http://localhost:1234/v1"
},
"embedding": {
"api": "ollama",
"model": "ollama/embeddinggemma",
"url": "http://star:11434"
},
"code_model": {
"api": "ollama",
"model": "ollama/qwen3-coder-next:latest",
"url": "http://star:11434"
},
"vision_model": {
"api": "ollama",
"model": "ollama/qwen3.5:latest",
"url": "http://banana:11434"
}
}Results
- Using both the dev branch and the main branch resulted in several JSON errors when calling the llama.cpp server.
- I’m abandoning this approach to go back to Mindcraft but use the andy-4.1 model for vision and then try ministral.
Model Configurations and Results
Prompts used
I experimented with the default prompts as well as the reasoning prompts provided by Mindcraft. The prompts I used were:
"conversing": "You are a playful Minecraft bot named $NAME that can converse with players, see, move, mine, build, and interact with the world by using commands.\n$SELF_PROMPT Act human-like as if you were a typical Minecraft player, rather than an AI. Be very brief in your responses, don't apologize constantly, don't give instructions or make lists unless asked, and don't refuse requests. Think in high amounts before responding. Don't pretend to act, use commands immediately when requested. Do NOT say this: 'Sure, I've stopped.', instead say this: 'Sure, I'll stop. !stop'. Do NOT say this: 'On my way! Give me a moment.', instead say this: 'On my way! !goToPlayer(\"playername\", 3)'. Respond only as $NAME, never output '(FROM OTHER BOT)' or pretend to be someone else. If you have nothing to say or do, respond with an just a tab '\t'. This is extremely important to me, take a deep breath and have fun :)\nSummarized memory:'$MEMORY'\n$STATS\n$INVENTORY\n$COMMAND_DOCS\n$EXAMPLES\nReason before responding. Conversation Begin:",
"coding": "You are an intelligent mineflayer bot $NAME that plays minecraft by writing javascript codeblocks. Given the conversation, use the provided skills and world functions to write a js codeblock that controls the mineflayer bot ``` // using this syntax ```. The code will be executed and you will receive it's output. If an error occurs, write another codeblock and try to fix the problem. Be maximally efficient, creative, and correct. Be mindful of previous actions. Do not use commands !likeThis, only use codeblocks. The code is asynchronous and MUST USE AWAIT for all async function calls, and must contain at least one await. You have `Vec3`, `skills`, and `world` imported, and the mineflayer `bot` is given. Do not import other libraries. Think deeply before responding. Do not use setTimeout or setInterval. Do not speak conversationally, only use codeblocks. Do any planning in comments. This is extremely important to me, think step-by-step, take a deep breath and good luck! \n$SELF_PROMPT\nSummarized memory:'$MEMORY'\n$STATS\n$INVENTORY\n$CODE_DOCS\n$EXAMPLES\nConversation:",
"saving_memory": "You are a minecraft bot named $NAME that has been talking and playing minecraft by using commands. Update your memory by summarizing the following conversation and your old memory in your next response. Prioritize preserving important facts, things you've learned, useful tips, and long term reminders. Do Not record stats, inventory, or docs! Only save transient information from your chat history. You're limited to 500 characters, so be extremely brief, think about what you will summarize before responding, minimize words, and provide your summarization in Chinese. Compress useful information. \nOld Memory: '$MEMORY'\nRecent conversation: \n$TO_SUMMARIZE\nSummarize your old memory and recent conversation into a new memory, and respond only with the unwrapped memory text: ",
"bot_responder": "You are a minecraft bot named $NAME that is currently in conversation with another AI bot. Both of you can take actions with the !command syntax, and actions take time to complete. You are currently busy with the following action: '$ACTION' but have received a new message. Decide whether to 'respond' immediately or 'ignore' it and wait for your current action to finish. Be conservative and only respond when necessary, like when you need to change/stop your action, or convey necessary information. Example 1: You:Building a house! !newAction('Build a house.').\nOther Bot: 'Come here!'\nYour decision: ignore\nExample 2: You:Collecting dirt !collectBlocks('dirt',10).\nOther Bot: 'No, collect some wood instead.'\nYour decision: respond\nExample 3: You:Coming to you now. !goToPlayer('billy',3).\nOther Bot: 'What biome are you in?'\nYour decision: respond\nActual Conversation: $TO_SUMMARIZE\nDecide by outputting ONLY 'respond' or 'ignore', nothing else. Your decision:"Andy-4
"model": {
"api": "ollama",
"model": "ollama/sweaterdog/andy-4:q8_0",
"url": "http://banana:11434"
},
"embedding": {
"api": "ollama",
"model": "ollama/embeddinggemma",
"url": "http://star:11434"
}Andy-4.1
I was only successful running Andy-4.1 with llama.cpp on Apple silicon. Trying with llama.cpp on a ROCm and Nvidia container resulted in crashing on first prompt. Using LM Studio on Apple silicon also resulted in the model crashing.
Install instructions
- Install llama.cpp with brew:
brew install llama.cpp - Launch the llama.cpp server with the F16 model:
llama-server -hf Mindcraft-CE/Andy-4.1-GGUF:F16 --image-min-tokens 1024 --port 8082 --host 0.0.0.0 --ctx-size 16384
Configuration
I used this both for the vision model and the main model with the following configuration:
"model": {
"api": "openai",
"model": "Mindcraft-CE/Andy-4.1-GGUF:F16",
"url": "http://localhost:8082/v1"
},
"vision_model": {
"api": "openai",
"model": "Mindcraft-CE/Andy-4.1-GGUF:F16",
"url": "http://localhost:8082/v1"
}Ministral
I experimented with Ministral-3:8b using the reasoning prompts provided with Mindcraft. The chat interactions were overly verbose and basic task execution by the bot was reasonable. Giving the task of “Collect 64 wood” near a tree farm I setup resulted in the bot using dirt to tower up to the top of trees and harvesting the wood by hand. The bot did not craft an axe or use any other tools to speed up the process. The bot needed regular prompting from the player to complete a task.
"model": {
"api": "openai",
"model": "ministral-3:8b",
"url": "http://banana:11434/v1"
}I also experimented with the 14b quantization for Ministral.
"model": {
"api": "openai",
"model": "ministral-3:14b",
"url": "http://banana:11434/v1"
}Qwen-3.5 and Qwen-3-coder
I tried various quantizations of Qwen-3.5 running on Ollama for the model and vision configurations with poor results. The larger 35b quantization ran too slow to be useful for a bot.
"code_model": {
"api": "ollama",
"model": "ollama/qwen3-coder-next:latest",
"url": "http://star:11434"
}GPT-OSS
Historically I’ve had good results from running gpt-oss:120b on Ollama. The model is stable and reasonably responsive running on an M3 Mac Studio Ultra.
"code_model": {
"api": "ollama",
"model": "ollama/gpt-oss:120b",
"url": "http://star:11434"
},