Benchmark
This page compares hardware and models against each other using a single repeatable prompt, so different setups can be evaluated on equal footing.
Benchmark prompt
Each run uses the exact prompt below. Duration is measured wall-clock from prompt submission to a working tictactoe.html.
time situ -p "Create a single, production-ready file named tictactoe.html containing all HTML, CSS, and JavaScript; this file must implement a polished, full-screen Tic-Tac-Toe game featuring a Human (X) vs. Computer (O) mode with the human starting, automated computer logic, win/draw detection across all axes, and a post-game result display with a 'Restart' button—output the full source code only so it is ready for immediate local saving and execution."
Results
| Hardware OS |
Configuration | Duration | Comment |
|---|---|---|---|
| Macbook Air M3, 24GB RAM macOS 26.x, Podman 5.8.2 |
gemma-4-E4B-it-Q4_K_M.gguf | 05m 50s | |
| AMD Ryzen 7 9700X (8 cores) 32GB RAM Ubuntu |
gemma-4-E4B-it-Q4_K_M.gguf | 04m 07s | |
| AMD Ryzen 9 9900X (8 cores) 32GB RAM / NVIDIA RTX 5070 TI 16GB Ubuntu |
gemma-4-E4B-it-Q4_K_M.gguf, llama.cpp CUDA | 00m 23s |
Cloud Benchmarks
Claude Sonnet 4.6 — 01m 12s
Claude Haiku 4.5 — 00m 30s
Gemini 3 Flash — 00m 30s
Related
- Tuning — raise the Podman memory limit to unlock larger models for better local LLM inference.
- CUDA Setup — GPU acceleration configuration for faster local AI coding inference.
- Configuration Reference — model selection, context size, and all
situ.confsettings.