Advanced Topics

CUDA Setup

SITU defaults to a CPU-only llama.cpp image. Switching to GPU acceleration requires one line in situ.conf and a compatible NVIDIA driver — nothing else changes.

How the sidecar works

When a session starts, SITU spins up a llama.cpp container alongside the agent container inside the same Podman pod. This sidecar loads the model and serves it over the pod-internal network. The image used for that sidecar is controlled by the LLAMA_IMAGE parameter in situ.conf.

The default image uses CPU inference:

LLAMA_IMAGE=ghcr.io/ggml-org/llama.cpp:server

Prerequisites

Configure Podman for GPU access

Once the NVIDIA Container Toolkit is installed, generate a CDI specification so Podman knows how to expose the GPU to containers:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

Verify that Podman can reach the GPU before launching SITU:

podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi

If nvidia-smi prints the GPU table, SITU will pick up the GPU the next time you start a session with a CUDA image.

Enabling GPU acceleration

The llama.cpp project publishes a separate image with CUDA support. To use it, open ~/.situ/situ.conf and set:

LLAMA_IMAGE=ghcr.io/ggml-org/llama.cpp:server-cuda

That is the only change required. On the next session start, SITU pulls the CUDA image (first run only) and the sidecar runs inference on the GPU.

macOS note

macOS does not support NVIDIA CUDA. On Apple Silicon, inference runs on the CPU image. Metal/MPS support may come in a future llama.cpp image variant.

Related