ClawBench

Standardized benchmarks for OpenClaw forks running on real hardware constraints. Each test runs in a Docker container matching the device's CPU and memory profile.

Total Runs178

Forks Tested8

Best Avg Score68.2

Fork Performance Summary

Filtered Results(0 results)

#	Device	Fork	Score	Cold Start	Memory	Disk	Caps

Open Source Benchmark Suite

How ClawBench Works

A repeatable, containerized pipeline that tests every fork under real hardware constraints.

Containerize

Fork is cloned into a Docker container with CPU and RAM limits matching the target device.

Build

Dependencies installed via the native toolchain — Go, Rust, Python, TypeScript, or C.

Measure

Entry point detected, cold start timed, peak memory tracked via cgroup, disk usage measured.

Score

Results combined into a 0–100 composite score weighted across four dimensions.

Scoring Breakdown

100 pts

Capabilities40

Messaging, browser, code exec, memory, files, search, MCP, tool use

Latency30

Cold start time — clone + install + startup. Under 5s = full marks

Size20

Disk footprint after install. Under 20MB = full marks

Build Success10

5 pts for dependency install, 5 pts for successful startup

Verdicts

85–100

Runs Great

60–84

Runs OK

30–59

Barely Runs

0–29

Won’t Run

8 Capability Tests

Messaging

Browser

Code Exec

Memory

Files

Web Search

MCP

Tool Use

Detected via static source analysis and runtime module probing. Each passed test contributes 5 pts to the capabilities score.

View ClawBench on GitHub →

Run benchmarks locally or contribute improvements

ClawBench

Standardized benchmarks for OpenClaw forks running on real hardware constraints. Each test runs in a Docker container matching the device's CPU and memory profile.

Total Runs178

Forks Tested8

Best Avg Score68.2