So many people asking about clawdbot offline...

**NOTICE: running this locally is going to be slow, and probably somewhat insecure. Information here is for POC purposes. OpenClaw even calls out that models will run horrible slow without some major gpus. Still worth testing out for fun though!

Yeah so, I was running through answering questions for people on youtube channels about clawdbot. Quite a few people brought up running it offline, makes sense. Really this just comes down to understanding where the model host is, and where the client is.

At a high level:


Your Gaming PC is the model/host - It hosts the AI model, and an api server in this case
Your laptop is the client - it runs clawdbot and connects to the api server via an api key.


Then you could run it "offline", if by that you mean not connecting to the wider internet, but you will still need to be part of a network. Still there is a lot of benefit to running your own ai model locally.

I'll walk you through it. You'll need something to run your models on, and ideally it will also support a popular api server style. We will use vLLM for this since it checks the boxes. But im sure you could get it to work with ollama or something else.

Prerequisites:
(Gaming PC) Install Python + PyTorch (Cuda in this case) - https://pytorch.org/get-started/locally/

(Gaming PC) Install vllm - make sure you verify you have the right installation type for your gpu. - https://docs.vllm.ai/en/latest/

(Laptop) Install Clawdbot - this is the client. See Phase 3 for more information there.

Create a python virtual env with either uv or venv (vllm calls for uv, but you can use venv as well). Activate the environment. Now you should be ready to run vllm serve.

-- Phase 1: Gaming laptop --
A couple notes here:
I ran into an issue after I installed vllm. By default gpu utilization is set to 90%, I was using more than 90% of my 16gb of vram. So the engine failed to startup. However you can set the gpu-memory-utilization in the vllm command to startup. I set mine to 50% here, but you could more or less depending on your situation.

Pytorch manages blocks of GPU memory, but in order to maintain speed it uses a cache rather than just starting and stopping. I ran the following command to allow the blocks to expand and contract so we dont hit "OOM" with GPU.


export PYTORCH_ALLOC_CONF=expandable_segments:True

if you run into any issues with pytorch.. just run a sanity check to make sure that its starting.

python - <<EOF
import torch
print(torch.cuda.is_available())
print(torch.cuda.get_device_name(0))
free, total = torch.cuda.mem_get_info()
print(f"Free VRAM: {free/10243:.2f} GiB / {total/10243:.2f} GiB")
EOF

Then we can start the vllm server and model.

vllm serve Qwen/Qwen2.5-7B-Instruct-AWQ
--host 0.0.0.0
--port 8000
--quantization awq
--gpu-memory-utilization 0.70
--max-model-len 8192

you should see something like this at the end

(APIServer pid=19861) INFO: Started server process [19861]
(APIServer pid=19861) INFO: Waiting for application startup.
(APIServer pid=19861) INFO: Application startup complete.

Open a terminal on the gaming rig and make a test call.
curl http://127.0.0.1:8000/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model":"Qwen/Qwen2.5-7B-Instruct-AWQ",
"messages":[{"role":"user","content":"Reply with exactly: OK"}]
}'

you'll get back some json response, but in there look for the key value pair "content":"OK", thats what we are looking for!

— Phase 2: Firewall configuration (skip and try phase 3 curl command) ---
check to see if you have a firewall rule set on the gaming machine. depending on what system this might be its going to be different. for my pop_os! environment its...

sudo ufw status

if its inactive, thats fine, if you have something active like me...

sudo ufw allow 8000/tcp

either way grab the ip address of your gaming computer with vllm.

ip addr


— Phase 3: Laptop and Clawdbot ---

alright! next validate you can hit the gaming rig from your laptop
curl http://<gaming_computer_ip>:8000/v1/models

FOR THE INSTALLATION:
Use manual mode, see the wizard steps here
You want to point your openclaw instance to the gaming machine on port 8000. You can set the model to whatever it is running in the vllm step. If its not listed in the wizard just make the changes in your .openclaw configuration file (normally located at ~/.openclaw/openclaw.json). In this case you are going to not have any auth either (everything is local, while I wouldnt consider this production ready it'll satisfy the POC). Make sure to check out the docs if you are going to consider testing.

Point the openclaw client to your host and you should be good to go!