Hi Dynamo developers!
We wanted to present the following key timelines and focus areas for Dynamo in H1 2026.
π¨ Update to Minor Release Process
We plan on having 3 major releases (0.8 - 1.0) until we reach 1.0 by GTC β26. Dynamo will continue to be released on a biweekly cadence as before, but we are changing our approach to minor releases.
Previously, minor releases (i.e., 0.x.1) for Dynamo were cut from main just like our major releases (i.e., 0.x.0), which included all changes made to main between code freeze of the two releases.
Going forward, minor releases (e.g., 0.8.1) will be based on the previous major release (e.g., 0.8.0) rather than main, enabling us to focus on critical bug fixes and important feature updates.
π
Timeline
Planned dates for future releases are shown below.
| v0.8 |
v0.8.1 |
v0.9.0 |
v0.9.1 |
v1.0 |
v1.0 + |
| 1/14 |
1/28 |
2/11 |
2/25 |
3/11 |
Dates to be shared after GTC |
We will be sharing more details about each major release as GitHub issues pinned next to this overall H1'26 roadmap.
π― H1β26 Focus Areas
Our goal is for Dynamo to provide a seamless configuration and deployment experience. To achieve this, we are focused on five key areas:
- Performance
- Production Grade Serving & Scaling
- Core (including Router and KV Caching)
- Agents
- Multimodality and Diffusion (Omni)
-
Performance
- AIConfigurator
- Improve prediction accuracy for all LLM inference engines (SGLang, TRT-LLM, vLLM)
- Special thanks to the Mooncake team for contribution adding SGLang support to AIC π.
- Support for popular models such as upcoming DeepSeek models.
- Support for Blackwell GPU
- Multi-Feature Recipes
- Add more recipes that combine KV-aware routing, disaggregated serving and KV cache offloading to maximize performance for the following use cases:
- Agents (Qwen3 32B)
- Coding (Qwen3 235B or DSV3)
- Multimodality (Qwen3-VL 30B)
-
Production Grade Serving & Scaling
-
Core (including Router and KV Caching)
- Removing NATS and etcd Dependencies from Dynamo
- As of 0.8.0, NATS and etcd are optional for the requests and discovery planes - replaced with transport-agnostic requests via TCP and Kubernetes-native service discovery via EndpointSlices. Removal of the NATS requirement for the KV events plane, used for KV-aware routing, is in progress.
- Router
- Hierarchical routing to enable a high-performance downstream module that integrates with upstream schedulers by exchanging real-time metadata and granular feedback metrics
- KV Caching
- Performant KV offloading from HBM to host memory and SSD; performance optimization for remote storage in progress.
- Distributed KV cache management across multiple nodes via P2P mesh or global object and file storage
- Laying groundwork for CUDA Memory Extension (CME) support to enable future hardware to efficiently share KV cache across nodes via unified memory access over NVLink fabric
- Support for SGLang
- Multi-LoRA Support
- Initial implementation is available, and we will finish our design implementation outlined here.
-
Agents
- Predictive Routing
- Proactive Load Balancing: Decisions are informed by expected future load rather than just current system saturation.
- Intelligent Cache Retention: Router prioritizes retaining KV cache blocks that Nemo Agentic Toolkit predicts will have high reuse, rather than using standard eviction policies.
- Nuanced Session Affinity: Instead of binary "stickiness," Router can maintain affinity for sessions with high predicted reuse or allow migration for sessions nearing completion.
- KV cache offloading and prefetching for tool calls
-
Multimodality and Diffusion
- Multimodality
- Multimodal hash router support for vLLM and SGLang (already enabled for TRT-LLM)
- E/P/D disaggregation performance optimization
- Diffusion
- Support for SGLang Diffusion/Omni and vLLM Omni
- Extend Planner to support autoscaling for Omni model (e.g. UniVideo)
Please let us know in the comments if there are additional features that the Dynamo team should prioritize. Thank you so much for your ongoing feedback, and we will do our best to provide the best possible Dynamo for the community. π
Hi Dynamo developers!
We wanted to present the following key timelines and focus areas for Dynamo in H1 2026.
π¨ Update to Minor Release Process
We plan on having 3 major releases (0.8 - 1.0) until we reach 1.0 by GTC β26. Dynamo will continue to be released on a biweekly cadence as before, but we are changing our approach to minor releases.
Previously, minor releases (i.e., 0.x.1) for Dynamo were cut from main just like our major releases (i.e., 0.x.0), which included all changes made to main between code freeze of the two releases.
Going forward, minor releases (e.g., 0.8.1) will be based on the previous major release (e.g., 0.8.0) rather than main, enabling us to focus on critical bug fixes and important feature updates.
π Timeline
Planned dates for future releases are shown below.
We will be sharing more details about each major release as GitHub issues pinned next to this overall H1'26 roadmap.
π― H1β26 Focus Areas
Our goal is for Dynamo to provide a seamless configuration and deployment experience. To achieve this, we are focused on five key areas:
Performance
Production Grade Serving & Scaling
Planner
Fault Tolerance
Grove - Kubernetes-native AI inference orchestration
ModelExpress - Reduces latency of artifact downloads and writes
Core (including Router and KV Caching)
Agents
Multimodality and Diffusion
Please let us know in the comments if there are additional features that the Dynamo team should prioritize. Thank you so much for your ongoing feedback, and we will do our best to provide the best possible Dynamo for the community. π