Dynamo v1.0.2

Dynamo v1.0.2 - Release Notes

Summary

Dynamo v1.0.2 is a patch release focusing on Frontend correctness fixes, DGDR-driven Kubernetes deployment robustness, rolling-update flexibility, and guided-decoding input hardening.

Key fixes restore real stream metadata in non-streaming responses with tool calls, correct Kimi K2.5 tokenizer special-token handling that caused TensorRT-LLM to reject requests, and add byte-length and nesting-depth caps to the OpenAI guided-decoding path.

On the deployment side, DGDR-created DynamoGraphDeployments now derive their name from the parent DGDR, DGDR-managed ConfigMaps cascade-delete with their parent, the Operator no longer thrashes on foreground cascading deletion, and per-WorkerSet MDC checksum validation enables rolling updates with divergent worker configuration under the same Model.

Base Branch: release/1.0.1

Key Dependencies

Dynamo	SGLang	TensorRT-LLM	vLLM	NIXL
v1.0.2	`0.5.9`	`1.3.0rc5.post1`	`0.16.0`	`0.10.1`

For container images, wheels, Helm charts, and Rust crates, see Dynamo Release Artifacts.
For full version compatilbity information, see Dynamo Support Matrix.

Full Changelog

Kubernetes Deployment

DGDR-Driven DGD Naming: Fixed Profiler-generated DynamoGraphDeployment naming so that DGDs derive their name from the parent DynamoGraphDeploymentRequest (<DGDR>-dgd) instead of from topology alone (<backend>-<agg/disagg>) (#7835), eliminating namespace-level name collisions when multiple DGDRs share the same backend/topology and respecting user-provided names from spec.overrides when present.
DGDR ConfigMap Owner References: Added Kubernetes owner references to ConfigMaps created by DGDR (#7881) so that DGDR-managed ConfigMaps are cascade-deleted with their parent.

Runtime

Per-WorkerSet MDC Checksum Validation: Scoped Model Discovery Card checksum validation from per-Model to per-WorkerSet (#8278), enabling rolling updates where different WorkerSets under the same Model can carry different configuration (e.g. tool-call parser) without draining existing workers first. Mismatches are still rejected when a new worker joins an existing WorkerSet, but cross-WorkerSet checksum drift is no longer a hard error.

Bug Fixes

DGD Cascading Deletion Thrashing: Fixed Operator behavior under foreground cascading deletion of DynamoGraphDeployments (#8212) so the Operator no longer thrashes the resource during teardown, ensuring clean DGD deletion in Kubernetes garbage-collection scenarios.
Stream Metadata Preservation: Fixed OpenAI Frontend stream finalization that overwrote real id, model, and created fields with hardcoded placeholders (stream-end, unknown, 0) when a tool-call parser combined streamed chunks into a non-streaming response (#8281), restoring correct response metadata for non-streaming tool-call requests.
Per-Node GPU Topology in DGD Builder: Fixed thorough-mode MoE config enumeration in the Planner/Profiler that ignored numGpusPerNode and produced unschedulable candidate DGDs on multi-node clusters (#8281). Worker GPU resource limits are now clamped per node and multinode.nodeCount is set for workers that span multiple nodes.
Kimi Tokenizer Special Tokens: Fixed Rust tiktoken tokenizer handling of reserved-token fallback names for Kimi K2.5 (#7898), resolving prompt-token inflation that caused TensorRT-LLM to reject requests with negative default_max_tokens and enabling correct serving of nvidia/Kimi-K2.5-NVFP4 and other Kimi K2.5 models.
Guided-Decoding Input Bounds: Added byte-length and nesting-depth caps to OpenAI guided-decoding input validation (#8349) — guided_grammar 64 KiB, guided_regex 32 KiB, guided_whitespace_pattern 1 KiB, guided_json 256 KiB serialized with a nesting-depth cap of 64 — bounding pathological inputs before they reach the downstream guided-decoding backend.

Full Changelog: v1.0.1...v1.0.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly