Skip to content

Dynamo v1.0.2

Latest

Choose a tag to compare

@dagil-nvidia dagil-nvidia released this 23 Apr 03:21
e3cbfde

Dynamo v1.0.2 - Release Notes

Summary

Dynamo v1.0.2 is a patch release focusing on Frontend correctness fixes, DGDR-driven Kubernetes deployment robustness, rolling-update flexibility, and guided-decoding input hardening.

Key fixes restore real stream metadata in non-streaming responses with tool calls, correct Kimi K2.5 tokenizer special-token handling that caused TensorRT-LLM to reject requests, and add byte-length and nesting-depth caps to the OpenAI guided-decoding path.

On the deployment side, DGDR-created DynamoGraphDeployments now derive their name from the parent DGDR, DGDR-managed ConfigMaps cascade-delete with their parent, the Operator no longer thrashes on foreground cascading deletion, and per-WorkerSet MDC checksum validation enables rolling updates with divergent worker configuration under the same Model.

Base Branch: release/1.0.1

Key Dependencies

Dynamo SGLang TensorRT-LLM vLLM NIXL
v1.0.2 0.5.9 1.3.0rc5.post1 0.16.0 0.10.1

For container images, wheels, Helm charts, and Rust crates, see Dynamo Release Artifacts.
For full version compatilbity information, see Dynamo Support Matrix.

Full Changelog

Kubernetes Deployment

  • DGDR-Driven DGD Naming: Fixed Profiler-generated DynamoGraphDeployment naming so that DGDs derive their name from the parent DynamoGraphDeploymentRequest (<DGDR>-dgd) instead of from topology alone (<backend>-<agg/disagg>) (#7835), eliminating namespace-level name collisions when multiple DGDRs share the same backend/topology and respecting user-provided names from spec.overrides when present.
  • DGDR ConfigMap Owner References: Added Kubernetes owner references to ConfigMaps created by DGDR (#7881) so that DGDR-managed ConfigMaps are cascade-deleted with their parent.

Runtime

  • Per-WorkerSet MDC Checksum Validation: Scoped Model Discovery Card checksum validation from per-Model to per-WorkerSet (#8278), enabling rolling updates where different WorkerSets under the same Model can carry different configuration (e.g. tool-call parser) without draining existing workers first. Mismatches are still rejected when a new worker joins an existing WorkerSet, but cross-WorkerSet checksum drift is no longer a hard error.

Bug Fixes

  • DGD Cascading Deletion Thrashing: Fixed Operator behavior under foreground cascading deletion of DynamoGraphDeployments (#8212) so the Operator no longer thrashes the resource during teardown, ensuring clean DGD deletion in Kubernetes garbage-collection scenarios.
  • Stream Metadata Preservation: Fixed OpenAI Frontend stream finalization that overwrote real id, model, and created fields with hardcoded placeholders (stream-end, unknown, 0) when a tool-call parser combined streamed chunks into a non-streaming response (#8281), restoring correct response metadata for non-streaming tool-call requests.
  • Per-Node GPU Topology in DGD Builder: Fixed thorough-mode MoE config enumeration in the Planner/Profiler that ignored numGpusPerNode and produced unschedulable candidate DGDs on multi-node clusters (#8281). Worker GPU resource limits are now clamped per node and multinode.nodeCount is set for workers that span multiple nodes.
  • Kimi Tokenizer Special Tokens: Fixed Rust tiktoken tokenizer handling of reserved-token fallback names for Kimi K2.5 (#7898), resolving prompt-token inflation that caused TensorRT-LLM to reject requests with negative default_max_tokens and enabling correct serving of nvidia/Kimi-K2.5-NVFP4 and other Kimi K2.5 models.
  • Guided-Decoding Input Bounds: Added byte-length and nesting-depth caps to OpenAI guided-decoding input validation (#8349) — guided_grammar 64 KiB, guided_regex 32 KiB, guided_whitespace_pattern 1 KiB, guided_json 256 KiB serialized with a nesting-depth cap of 64 — bounding pathological inputs before they reach the downstream guided-decoding backend.

Full Changelog: v1.0.1...v1.0.2