ClawBench

Standardized benchmarks for OpenClaw forks running on real hardware constraints. Each test runs in a Docker container matching the device's CPU and memory profile.

Total Runs178

Forks Tested8

Best Avg Score68.2

Fork Performance Summary

Filtered Results(9 results)

#	Device	Fork	Score	Cold Start	Memory	Disk	Caps
1	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	77	1.1s	128MB	-	3/3
2	Cloud VPS (4GB)Cloud	TypeScriptMoltworker	72	1.8m	554MB	41MB	8/8
3	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
4	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
5	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
6	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
7	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
8	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
9	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3

Open Source Benchmark Suite

How ClawBench Works

A repeatable, containerized pipeline that tests every fork under real hardware constraints.

Containerize

Fork is cloned into a Docker container with CPU and RAM limits matching the target device.

Build

Dependencies installed via the native toolchain — Go, Rust, Python, TypeScript, or C.

Measure

Entry point detected, cold start timed, peak memory tracked via cgroup, disk usage measured.

Score

Results combined into a 0–100 composite score weighted across four dimensions.

Scoring Breakdown

100 pts

Capabilities40

Messaging, browser, code exec, memory, files, search, MCP, tool use

Latency30

Cold start time — clone + install + startup. Under 5s = full marks

Size20

Disk footprint after install. Under 20MB = full marks

Build Success10

5 pts for dependency install, 5 pts for successful startup

Verdicts

85–100

Runs Great

60–84

Runs OK

30–59

Barely Runs

0–29

Won’t Run

8 Capability Tests

Messaging

Browser

Code Exec

Memory

Files

Web Search

MCP

Tool Use

Detected via static source analysis and runtime module probing. Each passed test contributes 5 pts to the capabilities score.

View ClawBench on GitHub →

Run benchmarks locally or contribute improvements

ClawBench

Standardized benchmarks for OpenClaw forks running on real hardware constraints. Each test runs in a Docker container matching the device's CPU and memory profile.

Total Runs178

Forks Tested8

Best Avg Score68.2

Fork Performance Summary

NanoClaw

MimiClaw

Nanobot

PicoClaw

IronClaw

OpenClaw

ZeroClaw

Moltworker

Filter:All Moltworker ZeroClaw MimiClaw PicoClaw Nanobot NanoClaw IronClaw OpenClaw|Appliance Cloud Desktop Handheld Laptop Microcontroller Mini PC NAS Phone Router SBC Server Tablet

Filtered Results(9 results)

#	Device	Fork	Score	Cold Start	Memory	Disk	Caps
1	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	77	1.1s	128MB	-	3/3
2	Cloud VPS (4GB)Cloud	TypeScriptMoltworker	72	1.8m	554MB	41MB	8/8
3	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
4	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
5	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
6	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
7	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
8	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3
9	Cloud VPS (2GB)Cloud	TypeScriptMoltworker	7.5	1.2s	512MB	-	2/3

Open Source Benchmark Suite

How ClawBench Works

A repeatable, containerized pipeline that tests every fork under real hardware constraints.

Containerize

Fork is cloned into a Docker container with CPU and RAM limits matching the target device.

Build

Dependencies installed via the native toolchain — Go, Rust, Python, TypeScript, or C.

Measure

Entry point detected, cold start timed, peak memory tracked via cgroup, disk usage measured.

Score

Results combined into a 0–100 composite score weighted across four dimensions.

Scoring Breakdown

100 pts

Capabilities40

Messaging, browser, code exec, memory, files, search, MCP, tool use

Latency30

Cold start time — clone + install + startup. Under 5s = full marks

Size20

Disk footprint after install. Under 20MB = full marks

Build Success10

5 pts for dependency install, 5 pts for successful startup

Verdicts

85–100

Runs Great

60–84

Runs OK

30–59

Barely Runs

0–29

Won’t Run

8 Capability Tests

Messaging

Browser

Code Exec

Memory

Files

Web Search

MCP

Tool Use

Detected via static source analysis and runtime module probing. Each passed test contributes 5 pts to the capabilities score.

View ClawBench on GitHub →

Run benchmarks locally or contribute improvements