vllm - llmhop

services.llmhop.vllm.enable

Whether to enable vLLM model serving via Quadlet, fronted by llmhop.

Type: boolean

Default:

false

Example:

true

services.llmhop.vllm.cacheDir

Host directory bind-mounted as the Hugging Face cache for every worker.

Type: absolute path

Default:

"/var/cache/vllm"

services.llmhop.vllm.dataDir

Home directory of services.llmhop.vllm.user. Used by rootless podman for container storage (~/.local/share/containers), so it must live on a filesystem that tolerates overlayfs.

Type: absolute path

Default:

"/var/lib/vllm"

Devices exposed to every model container — passed verbatim as Quadlet AddDevice= lines. Accepts both CDI references (recommended: nvidia.com/gpu=…, amd.com/gpu=…, intel.com/gpu=…, …) and raw host device paths (e.g. /dev/dri/renderD128). For CDI, the corresponding spec must be generated on the host (e.g. nvidia-ctk cdi generate). Defaults to [ "nvidia.com/gpu=all" ] when hardware.nvidia-container-toolkit.enable is set, otherwise empty (CPU-only). Per-model devices overrides this.

Type: list of string

Default:

if config.hardware.nvidia-container-toolkit.enable then
  [ "nvidia.com/gpu=all" ]
else
  [ ]

Example:

[
  "amd.com/gpu=all"
]

services.llmhop.vllm.environment

Environment variables set on every model service. Merged with services.llmhop.vllm.models.<name>.environment; per-model entries take precedence.

Type: attribute set of string

Default:

{ }

services.llmhop.vllm.environmentFile

File in KEY=VALUE format forwarded to every service. Use for secrets managed by sops-nix/agenix, e.g. a file containing HF_TOKEN=<token> to access gated Hugging Face repositories. Loaded before services.llmhop.vllm.models.<name>.environmentFile, so per-model files override these entries.

Type: null or absolute path

Default:

null

Example:

"/etc/vllm/.env"

services.llmhop.vllm.gid

Host GID assigned to services.llmhop.vllm.group and used as the inner-to-outer mapping target in --gidmap. Defaults to uid.

Type: unsigned integer, meaning >=0

Default:

config.services.llmhop.vllm.uid

services.llmhop.vllm.group

Primary group for services.llmhop.vllm.user. Defaults to the user name (matching the typical 1:1 user/group layout).

Type: string

Default:

config.services.llmhop.vllm.user

services.llmhop.vllm.image

Container image used for every model worker.

Type: string

Default:

"docker.io/vllm/vllm-openai"

services.llmhop.vllm.modelSettings

CLI flags forwarded to the model server for every model. true collapses to --<key>; null and false are dropped (write the negated key explicitly, e.g. "no-mmap" = true;, when the upstream CLI registers a --no-<key> form). Merged with services.llmhop.vllm.models.<name>.settings; per-model entries take precedence.

Type: attribute set of anything

Default:

{ }

services.llmhop.vllm.models

Models to serve. Each entry produces one quadlet container; the attribute name is the routing key. Enabled entries are sorted by ascending port.

Type: attribute set of (submodule)

Default:

{ }

Example:

{
  "qwen2-5-7b" = {
    model = "Qwen/Qwen2.5-7B-Instruct";
    port = 18001;
  };
  "llama-3-8b" = {
    model = "meta-llama/Meta-Llama-3-8B-Instruct";
    port = 18002;
    settings.max-model-len = 8192;
  };
}

services.llmhop.vllm.models.<name>.enable

Whether to enable serving of model ‹name›.

Type: boolean

Default:

true

Example:

true

services.llmhop.vllm.models.<name>.devices

Devices exposed to this model’s container — passed verbatim as Quadlet AddDevice= lines. Replaces (does not extend) services.llmhop.vllm.devices for this model. Use to pin a model to specific device indices (e.g. [ "nvidia.com/gpu=0" ]).

Type: list of string

Default:

config.services.llmhop.vllm.devices

Example:

[
  "nvidia.com/gpu=0"
]

services.llmhop.vllm.models.<name>.digest

Immutable digest of the container image (e.g. sha256:…). Mutually exclusive with tag.

Type: null or string

Default:

null

Example:

"sha256:a73fb0b9046fee099f7c1829d2548e6cc1740f4c2776a6855fa659ae5d0deb49"

services.llmhop.vllm.models.<name>.environment

Additional environment variables set on this model’s service. Merged with services.llmhop.vllm.environment; per-model entries take precedence.

Type: attribute set of string

Default:

{ }

services.llmhop.vllm.models.<name>.environmentFile

File in KEY=VALUE format forwarded to this model’s service. Loaded after services.llmhop.vllm.environmentFile, so its entries override global ones. Must be readable by the user systemd reads it as.

Type: null or absolute path

Default:

null

services.llmhop.vllm.models.<name>.model

Hugging Face repo id (or local path) passed to the model server.

Type: string

Example:

"Qwen/Qwen2.5-7B-Instruct"

services.llmhop.vllm.models.<name>.name

Canonical identifier for this model. Used for the unit name (vllm-<name>) and as the routing key registered with llmhop (clients select the backend by sending this value in the OpenAI model field).

Defaults to the attribute key, so the key itself must match the required label format.

Type: string matching the pattern [[:alnum:]][[:alnum:].-]*

Default:

"‹name›"

services.llmhop.vllm.models.<name>.port

Loopback host port forwarded to the container’s vLLM API. Must be unique per model.

Type: 16 bit unsigned integer; between 0 and 65535 (both inclusive)

services.llmhop.vllm.models.<name>.settings

CLI flags forwarded to the model server for this model. true collapses to --<key>; null and false are dropped (write the negated key explicitly, e.g. "no-mmap" = true;, when the upstream CLI registers a --no-<key> form). Merged with services.llmhop.vllm.modelSettings; per-model entries take precedence.

Type: attribute set of anything

Default:

{ }

services.llmhop.vllm.models.<name>.shmSize

Size of the container’s private /dev/shm tmpfs. PyTorch and friends use shared memory for NCCL/tensor-parallel inference; upstream recommends 32g (or --ipc=host). A private tmpfs is preferred for isolation: raise the value for larger models or higher tensor-parallel sizes.

Type: string

Default:

"32g"

Example:

"64g"

services.llmhop.vllm.models.<name>.tag

Tag of the container image used for this model. Mutually exclusive with digest.

Type: null or string

Default:

null

services.llmhop.vllm.openFilesLimit

File descriptor limit (LimitNOFILE) applied to every vllm systemd unit. Increase if the server logs accept: Too many open files under concurrent load.

Type: positive integer, meaning >0

Default:

services.llmhop.vllm.startupOrdering

Whether to chain enabled model services by ascending port during startup. GPU-memory profiling races otherwise: two workers booting on the same device each see it as fully free and race to claim their share, leading to OOM. Disable only when each model pins itself to a dedicated device via its own devices.

Type: boolean

Default:

true

services.llmhop.vllm.subGidCount

Size of the subordinate GID range mapped into every container. Defaults to subUidCount.

Type: positive integer, meaning >0

Default:

config.services.llmhop.vllm.subUidCount

services.llmhop.vllm.subGidStart

First host GID of the subordinate range mapped into every container. Defaults to subUidStart — most setups keep the UID and GID ranges aligned.

Type: unsigned integer, meaning >=0

Default:

config.services.llmhop.vllm.subUidStart

services.llmhop.vllm.subUidCount

Size of the subordinate UID range mapped into every container. 65536 covers the full unprivileged ID space inside the namespace.

Type: positive integer, meaning >0

Default:

services.llmhop.vllm.subUidStart

First host UID of the subordinate range mapped into every container. Container UIDs ≥1 are mapped to subUidCount consecutive host IDs starting here. Required — pick a value clear of NixOS system users (<1000), regular login UIDs, and other backends’ subordinate ranges on the same host.

Type: unsigned integer, meaning >=0

Example:

services.llmhop.vllm.tag

Default tag of the container image used for models that do not set their own tag or digest.

Type: string

Example:

"v0.11.0"

services.llmhop.vllm.uid

Host UID assigned to services.llmhop.vllm.user and used as the inner-to-outer mapping target in --uidmap. Required — pick a value that does not clash with other system users on the host.

Type: unsigned integer, meaning >=0

Example:

services.llmhop.vllm.user

Dedicated system user that owns the vllm cache directory and that container root is mapped to via --uidmap. Defaults to the backend name; override to point at a user the deployer manages externally (in which case the matching users.users.<name> and users.groups.<name> declarations become the deployer’s responsibility).

Type: string

Default:

backend

Keyboard shortcuts

llmhop