services.llmhop.llama-cpp.enable
Whether to enable llama.cpp model serving via systemd, fronted by llmhop.
Type: boolean
Default:
false
Example:
true
services.llmhop.llama-cpp.package
The llama-cpp package to use.
Type: package
Default:
pkgs.llama-cpp
services.llmhop.llama-cpp.environment
Environment variables set on every model service.
Merged with services.llmhop.llama-cpp.models.<name>.environment; per-model
entries take precedence.
Type: attribute set of string
Default:
{ }
services.llmhop.llama-cpp.environmentFile
File in KEY=VALUE format forwarded to every service.
Use for secrets managed by sops-nix/agenix, e.g. a file containing
HF_TOKEN=<token> to access gated Hugging Face repositories.
Loaded before services.llmhop.llama-cpp.models.<name>.environmentFile, so
per-model files override these entries.
Type: null or absolute path
Default:
null
Example:
"/etc/llama-cpp/.env"
services.llmhop.llama-cpp.modelSettings
CLI flags forwarded to the model server for every model.
true collapses to --<key>; null and false are dropped (write
the negated key explicitly, e.g. "no-mmap" = true;, when the upstream
CLI registers a --no-<key> form).
Merged with services.llmhop.llama-cpp.models.<name>.settings; per-model
entries take precedence.
Type: attribute set of anything
Default:
{ }
services.llmhop.llama-cpp.models
Models to serve.
Each entry produces one systemd service running llama-server; the
attribute name is the routing key surfaced through llmhop and the OpenAI
model field.
GPU selection is done via build-specific environment variables on
environment (top-level or per-model), since llama.cpp runs as a host
process — no CDI involved. Common variables: CUDA_VISIBLE_DEVICES
(CUDA), HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES (ROCm),
GGML_VK_VISIBLE_DEVICES (Vulkan), ZE_AFFINITY_MASK (SYCL).
Type: attribute set of (submodule)
Default:
{ }
Example:
{
"qwen3-8b" = {
port = 18001;
settings = {
hf-repo = "unsloth/Qwen3-8B-GGUF:UD-Q4_K_XL";
temperature = 1.0;
top-k = 20;
};
# Pin this model to a specific GPU. The right variable depends on
# the llama.cpp build: CUDA_VISIBLE_DEVICES for CUDA,
# HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES for ROCm,
# GGML_VK_VISIBLE_DEVICES for Vulkan, ZE_AFFINITY_MASK for SYCL.
environment.CUDA_VISIBLE_DEVICES = "0";
};
}
services.llmhop.llama-cpp.models.<name>.enable
Whether to enable serving of model ‹name›.
Type: boolean
Default:
true
Example:
true
services.llmhop.llama-cpp.models.<name>.environment
Additional environment variables set on this model’s service.
Merged with services.llmhop.llama-cpp.environment; per-model entries
take precedence.
Type: attribute set of string
Default:
{ }
services.llmhop.llama-cpp.models.<name>.environmentFile
File in KEY=VALUE format forwarded to this model’s service.
Loaded after services.llmhop.llama-cpp.environmentFile, so its entries
override global ones. Must be readable by the user systemd reads it as.
Type: null or absolute path
Default:
null
services.llmhop.llama-cpp.models.<name>.name
Canonical identifier for this model. Used for the unit name
(llama-cpp-<name>) and as the routing key registered with llmhop
(clients select the backend by sending this value in the OpenAI
model field).
Defaults to the attribute key, so the key itself must match the required label format.
Type: string matching the pattern [[:alnum:]][[:alnum:].-]*
Default:
"‹name›"
services.llmhop.llama-cpp.models.<name>.port
Loopback host port that llama-server binds to. Must be unique per
enabled model; the gateway (llmhop) reaches each backend at
http://127.0.0.1:<port>.
Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)
services.llmhop.llama-cpp.models.<name>.settings
CLI flags forwarded to the model server for this model.
true collapses to --<key>; null and false are dropped (write
the negated key explicitly, e.g. "no-mmap" = true;, when the upstream
CLI registers a --no-<key> form).
Merged with services.llmhop.llama-cpp.modelSettings; per-model entries
take precedence.
Type: attribute set of anything
Default:
{ }
services.llmhop.llama-cpp.openFilesLimit
File descriptor limit (LimitNOFILE) applied to every llama-cpp systemd unit.
Increase if the server logs accept: Too many open files under concurrent load.
Type: positive integer, meaning >0
Default:
1048576