Multimodal Model Serving

Deploy multimodal models with image, video, and audio support in Dynamo
View as Markdown

Dynamo supports multimodal inference across multiple LLM backends, enabling models to process images, video, and audio alongside text.

Security Requirement: Multimodal processing must be explicitly enabled at startup. See the relevant backend documentation (vLLM, SGLang, TRT-LLM) for the necessary flags. This prevents unintended processing of multimodal data from untrusted sources.

Key Features

Dynamo provides support for improving latency and throughput for vision-and-language workloads through the following features, that can be used together or separately, depending on your workload characteristics:

FeatureDescription
Embedding CacheCPU-side LRU cache that skips re-encoding repeated images
Encoder DisaggregationSeparate vision encoder worker for independent scaling
Multimodal KV RoutingMM-aware KV cache routing for optimal worker selection

Support Matrix

StackImageVideoAudio
vLLM🧪🧪
TRT-LLM
SGLang🧪

Status: ✅ Supported | 🧪 Experimental | ❌ Not supported

Security: URL Validation

All multimodal loaders route remote fetches through a shared URL policy (dynamo.common.multimodal.url_validator). Only https:// and data: URLs are allowed by default, private / internal IPs are blocked, and local file access is disabled. Every HTTP redirect hop is re-validated against the policy.

Two environment variables loosen the defaults for non-public deployments:

VariableDefaultEffect
DYN_MM_ALLOW_INTERNAL0Set to 1 to allow http:// and private / internal IP targets. Intended for on-prem or local-dev setups where media lives on an internal network.
DYN_MM_LOCAL_PATH(empty)Absolute directory prefix. When set, file:// URIs and bare paths are allowed if they resolve inside this prefix.

Never set DYN_MM_ALLOW_INTERNAL=1 on public-facing deployments. It opens SSRF paths to cloud metadata endpoints (AWS IMDS, GCE, Azure) and other internal services.

Example Workflows

Reference implementations for deploying multimodal models:

Backend Documentation

Detailed deployment guides, configuration, and examples for each backend: