Advanced Topics

Benchmark

This page compares hardware and models against each other using a single repeatable prompt, so different setups can be evaluated on equal footing.

Benchmark prompt

Each run uses the exact prompt below. Duration is measured wall-clock from prompt submission to a working tictactoe.html.

time situ -p "Create a single, production-ready file named tictactoe.html containing all HTML, CSS, and JavaScript; this file must implement a polished, full-screen Tic-Tac-Toe game featuring a Human (X) vs. Computer (O) mode with the human starting, automated computer logic, win/draw detection across all axes, and a post-game result display with a 'Restart' button—output the full source code only so it is ready for immediate local saving and execution."

Results

Hardware
OS
Configuration Duration Comment
Macbook Air M3, 24GB RAM
macOS 26.x, Podman 5.8.2
gemma-4-E4B-it-Q4_K_M.gguf 05m 50s
AMD Ryzen 7 9700X (8 cores) 32GB RAM
Ubuntu
gemma-4-E4B-it-Q4_K_M.gguf 04m 07s
AMD Ryzen 9 9900X (8 cores) 32GB RAM / NVIDIA RTX 5070 TI 16GB
Ubuntu
gemma-4-E4B-it-Q4_K_M.gguf, llama.cpp CUDA 00m 23s

Cloud Benchmarks

Claude Sonnet 4.6 — 01m 12s
Claude Haiku 4.5 — 00m 30s
Gemini 3 Flash — 00m 30s

Related