Advanced Topics

Benchmark

This page compares hardware and models against each other using a single repeatable prompt, so different setups can be evaluated on equal footing.

Benchmark prompt

Each run uses the exact prompt below. Duration is measured wall-clock from prompt submission to a working tictactoe.html.

time situ -p "Create a single, production-ready file named tictactoe.html containing all HTML, CSS, and JavaScript; this file must implement a polished, full-screen Tic-Tac-Toe game featuring a Human (X) vs. Computer (O) mode with the human starting, automated computer logic, win/draw detection across all axes, and a post-game result display with a 'Restart' button—output the full source code only so it is ready for immediate local saving and execution."

Results

Hardware OS	Configuration	Duration
Macbook Air M3, 24GB RAM macOS 26.x, Podman 5.8.2	gemma-4-E4B-it-Q4_K_M.gguf	05m 50s
AMD Ryzen 7 9700X (8 cores) 32GB RAM Ubuntu	gemma-4-E4B-it-Q4_K_M.gguf	04m 07s
AMD Ryzen 9 9900X (8 cores) 32GB RAM / NVIDIA RTX 5070 TI 16GB Ubuntu	gemma-4-E4B-it-Q4_K_M.gguf, llama.cpp CUDA	00m 23s

Cloud Benchmarks

Claude Sonnet 4.6 — 01m 12s
Claude Haiku 4.5 — 00m 30s
Gemini 3 Flash — 00m 30s

Tuning — raise the Podman memory limit to unlock larger models for better local LLM inference.
CUDA Setup — GPU acceleration configuration for faster local AI coding inference.
Configuration Reference — model selection, context size, and all situ.conf settings.

Benchmark

Benchmark prompt

Results

Cloud Benchmarks

Related