Dynamo v1.0.2 - Release Notes
Summary
Dynamo v1.0.2 is a patch release focusing on Frontend correctness fixes, DGDR-driven Kubernetes deployment robustness, rolling-update flexibility, and guided-decoding input hardening.
Key fixes restore real stream metadata in non-streaming responses with tool calls, correct Kimi K2.5 tokenizer special-token handling that caused TensorRT-LLM to reject requests, and add byte-length and nesting-depth caps to the OpenAI guided-decoding path.
On the deployment side, DGDR-created DynamoGraphDeployments now derive their name from the parent DGDR, DGDR-managed ConfigMaps cascade-delete with their parent, the Operator no longer thrashes on foreground cascading deletion, and per-WorkerSet MDC checksum validation enables rolling updates with divergent worker configuration under the same Model.
Base Branch: release/1.0.1
Key Dependencies
| Dynamo | SGLang | TensorRT-LLM | vLLM | NIXL |
|---|---|---|---|---|
| v1.0.2 | 0.5.9 |
1.3.0rc5.post1 |
0.16.0 |
0.10.1 |
For container images, wheels, Helm charts, and Rust crates, see Dynamo Release Artifacts.
For full version compatilbity information, see Dynamo Support Matrix.
Full Changelog
Kubernetes Deployment
- DGDR-Driven DGD Naming: Fixed Profiler-generated DynamoGraphDeployment naming so that DGDs derive their name from the parent DynamoGraphDeploymentRequest (
<DGDR>-dgd) instead of from topology alone (<backend>-<agg/disagg>) (#7835), eliminating namespace-level name collisions when multiple DGDRs share the same backend/topology and respecting user-provided names fromspec.overrideswhen present. - DGDR ConfigMap Owner References: Added Kubernetes owner references to ConfigMaps created by DGDR (#7881) so that DGDR-managed ConfigMaps are cascade-deleted with their parent.
Runtime
- Per-WorkerSet MDC Checksum Validation: Scoped Model Discovery Card checksum validation from per-Model to per-WorkerSet (#8278), enabling rolling updates where different WorkerSets under the same Model can carry different configuration (e.g. tool-call parser) without draining existing workers first. Mismatches are still rejected when a new worker joins an existing WorkerSet, but cross-WorkerSet checksum drift is no longer a hard error.
Bug Fixes
- DGD Cascading Deletion Thrashing: Fixed Operator behavior under foreground cascading deletion of DynamoGraphDeployments (#8212) so the Operator no longer thrashes the resource during teardown, ensuring clean DGD deletion in Kubernetes garbage-collection scenarios.
- Stream Metadata Preservation: Fixed OpenAI Frontend stream finalization that overwrote real
id,model, andcreatedfields with hardcoded placeholders (stream-end,unknown,0) when a tool-call parser combined streamed chunks into a non-streaming response (#8281), restoring correct response metadata for non-streaming tool-call requests. - Per-Node GPU Topology in DGD Builder: Fixed thorough-mode MoE config enumeration in the Planner/Profiler that ignored
numGpusPerNodeand produced unschedulable candidate DGDs on multi-node clusters (#8281). Worker GPU resource limits are now clamped per node andmultinode.nodeCountis set for workers that span multiple nodes. - Kimi Tokenizer Special Tokens: Fixed Rust tiktoken tokenizer handling of reserved-token fallback names for Kimi K2.5 (#7898), resolving prompt-token inflation that caused TensorRT-LLM to reject requests with negative
default_max_tokensand enabling correct serving ofnvidia/Kimi-K2.5-NVFP4and other Kimi K2.5 models. - Guided-Decoding Input Bounds: Added byte-length and nesting-depth caps to OpenAI guided-decoding input validation (#8349) —
guided_grammar64 KiB,guided_regex32 KiB,guided_whitespace_pattern1 KiB,guided_json256 KiB serialized with a nesting-depth cap of 64 — bounding pathological inputs before they reach the downstream guided-decoding backend.
Full Changelog: v1.0.1...v1.0.2