First Steps

Configuration

SITU reads its settings from situ.conf, located in the same directory as the launch script (~/.situ/situ.conf). All parameters are optional except MODEL.

The configuration file

Open ~/.situ/situ.conf in any text editor. Each setting is a simple KEY=VALUE line.

Parameter Content Default Description
CONTAINER_ENGINE podman or docker podman The container engine SITU uses to create and manage containers. Set to docker if you are running Docker instead of Podman.
MODE RESTRICTED or NETWORK RESTRICTED Controls external network access. RESTRICTED creates an internal container network with no external routes; NETWORK allows external connections. See Restricted Mode and Network Mode.
MOUNTPOINT Absolute directory path Current working directory The host directory mounted into the SITU container as the workspace. Set this to an encrypted volume or a project root to control exactly what the agent can read and write.
LLAMA_IMAGE Container image reference ghcr.io/ggml-org/llama.cpp:server (CPU) The llama.cpp server image used as the model sidecar. Switch to a CUDA variant (e.g. ghcr.io/ggml-org/llama.cpp:server-cuda) to use GPU acceleration. Ignored when LM_HOST is set.
MODEL GGUF filename none — required Path to the model file relative to the ~/.situ/models/ directory. The file must exist before starting the agent. Example: gemma-4-E4B-it-Q4_K_M.gguf.
CTX_SIZE Integer (tokens) 0 Context window size passed to the llama.cpp server. 0 automatically uses the model's own training context size, which is the recommended default. Set an explicit value to cap memory use on hardware-constrained machines or to extend context beyond the training limit with RoPE scaling.
TEMPERATURE Float 0.1 Sampling temperature passed to the llama.cpp sidecar. Lower values produce more deterministic output; higher values increase creativity. Only applies to the local sidecar — ignored when LM_HOST is set.
MAX_TOKENS Integer (tokens) 16384 Per-call generation budget for the agent. Caps how many tokens the model may produce in a single LLM call, including any reasoning tokens. Increase for tasks that require very long single-step outputs.
REASONING true or false false Enables the model's extended thinking (reasoning) mode. When true, the agent requests a reasoning trace before each response, which improves output quality at the cost of additional tokens. Set to false to disable reasoning and reduce token usage.
REASONING_BUDGET_MAXPERCENT Integer (%) 25 Caps the server-side thinking budget at this percentage of MAX_TOKENS. At the default of 25% with MAX_TOKENS=16384, the model may use up to 4096 tokens for reasoning before being guided to produce its answer. Set to 0 to disable thinking at the server level (also forced when REASONING=false). Set to -1 for no limit.
REASONING_BUDGET_MESSAGE String Let me now write the solution. Message injected by the llama.cpp server immediately before the </think> tag when the reasoning budget is exhausted. This guides the model to transition cleanly from reasoning to its answer rather than being cut off mid-thought.
PARALLEL Integer 1 Number of parallel request slots in the llama.cpp server. The default of 1 is correct for single-user use and minimises KV cache memory. Increase this only when multiple SITU pods share a single llamaservice instance.
LMS_READY_TIMEOUT Integer (seconds) 180 How long SITU waits for the llama.cpp server to finish loading the model before aborting. Increase this for very large models on slow storage.
LM_HOST Hostname or IP address none — pod sidecar is used Connect to an existing LM server on the network instead of spinning up the built-in llama.cpp sidecar. Requires MODE=NETWORK. Useful when a more powerful machine on the local network runs the model.
LM_PORT Integer (port number) 8080 Port of the external LM server. Only relevant when LM_HOST is set.

Lines beginning with # are comments and have no effect — the file ships with every parameter commented out and its default shown in the comment.

Related