• About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
Monday, May 25, 2026
mGrowTech
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions
No Result
View All Result
mGrowTech
No Result
View All Result
Home Al, Analytics and Automation

NVIDIA Releases Dynamo v0.9.0: A Massive Infrastructure Overhaul Featuring FlashIndexer, Multi-Modal Support, and Removed NATS and ETCD

Josh by Josh
February 20, 2026
in Al, Analytics and Automation
0


NVIDIA has just released Dynamo v0.9.0. This is the most significant infrastructure upgrade for the distributed inference framework to date. This update simplifies how large-scale models are deployed and managed. The release focuses on removing heavy dependencies and improving how GPUs handle multi-modal data.

The Great Simplification: Removing NATS and etcd

The biggest change in v0.9.0 is the removal of NATS and ETCD. In previous versions, these tools handled service discovery and messaging. However, they added ‘operational tax’ by requiring developers to manage extra clusters.

NVIDIA replaced these with a new Event Plane and a Discovery Plane. The system now uses ZMQ (ZeroMQ) for high-performance transport and MessagePack for data serialization. For teams using Kubernetes, Dynamo now supports Kubernetes-native service discovery. This change makes the infrastructure leaner and easier to maintain in production environments.

Multi-Modal Support and the E/P/D Split

Dynamo v0.9.0 expands multi-modal support across 3 main backends: vLLM, SGLang, and TensorRT-LLM. This allows models to process text, images, and video more efficiently.

A key feature in this update is the E/P/D (Encode/Prefill/Decode) split. In standard setups, a single GPU often handles all 3 stages. This can cause bottlenecks during heavy video or image processing. v0.9.0 introduces Encoder Disaggregation. You can now run the Encoder on a separate set of GPUs from the Prefill and Decode workers. This allows you to scale your hardware based on the specific needs of your model.

Sneak Preview: FlashIndexer

This release includes a sneak preview of FlashIndexer. This component is designed to solve latency issues in distributed KV cache management.

When working with large context windows, moving Key-Value (KV) data between GPUs is a slow process. FlashIndexer improves how the system indexes and retrieves these cached tokens. This results in a lower Time to First Token (TTFT). While still a preview, it represents a major step toward making distributed inference feel as fast as local inference.

Smart Routing and Load Estimation

Managing traffic across 100s of GPUs is difficult. Dynamo v0.9.0 introduces a smarter Planner that uses predictive load estimation.

The system uses a Kalman filter to predict the future load of a request based on past performance. It also supports routing hints from the Kubernetes Gateway API Inference Extension (GAIE). This allows the network layer to communicate directly with the inference engine. If a specific GPU group is overloaded, the system can route new requests to idle workers with higher precision.

The Technical Stack at a Glance

The v0.9.0 release updates several core components to their latest stable versions. Here is the breakdown of the supported backends and libraries:

Component Version
vLLM v0.14.1
SGLang v0.5.8
TensorRT-LLM v1.3.0rc1
NIXL v0.9.0
Rust Core dynamo-tokens crate

The inclusion of the dynamo-tokens crate, written in Rust, ensures that token handling remains high-speed. For data transfer between GPUs, Dynamo continues to leverage NIXL (NVIDIA Inference Transfer Library) for RDMA-based communication.

Key Takeaways

  1. Infrastructure Decoupling (Goodbye NATS and ETCD): The release completes the modernization of the communication architecture. By replacing NATS and ETCD with a new Event Plane (using ZMQ and MessagePack) and Kubernetes-native service discovery, the system removes the ‘operational tax’ of managing external clusters.
  2. Full Multi-Modal Disaggregation (E/P/D Split): Dynamo now supports a complete Encode/Prefill/Decode (E/P/D) split across all 3 backends (vLLM, SGLang, and TRT-LLM). This allows you to run vision or video encoders on separate GPUs, preventing compute-heavy encoding tasks from bottlenecking the text generation process.
  3. FlashIndexer Preview for Lower Latency :The ‘sneak preview’ of FlashIndexer introduces a specialized component to optimize distributed KV cache management. It is designed to make the indexing and retrieval of conversation ‘memory’ significantly faster, aimed at further reducing the Time to First Token (TTFT).
  4. Smarter Scheduling with Kalman Filters: The system now uses predictive load estimation powered by Kalman filters. This allows the Planner to forecast GPU load more accurately and handle traffic spikes proactively, supported by routing hints from the Kubernetes Gateway API Inference Extension (GAIE).

Check out the GitHub Release here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




Source_link

READ ALSO

Best Authentication Platforms for AI Agents and MCP Servers in 2026

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

Related Posts

Best Authentication Platforms for AI Agents and MCP Servers in 2026
Al, Analytics and Automation

Best Authentication Platforms for AI Agents and MCP Servers in 2026

May 25, 2026
Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments
Al, Analytics and Automation

Build a Complete Langfuse Observability and Evaluation Pipeline for Tracing, Prompt Management, Scoring, and Experiments

May 25, 2026
Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%
Al, Analytics and Automation

Microsoft Research Releases Webwright: A Terminal-Native Web Agent Framework That Scores 60.1% on Odysseys, Up from Base GPT-5.4’s 33.5%

May 24, 2026
Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents
Al, Analytics and Automation

Tencent Open-Sources TencentDB Agent Memory: A 4-Tier Local Memory Pipeline for AI Agents

May 24, 2026
Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification
Al, Analytics and Automation

Nous Research Releases Contrastive Neuron Attribution (CNA): Sparse MLP Circuit Steering Without SAE Training or Weight Modification

May 23, 2026
A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents
Al, Analytics and Automation

A Step-by-Step Coding Tutorial to Implement GBrain: The Self-Wiring Memory Layer Built by Y Combinator’s Garry Tan for AI Agents

May 23, 2026
Next Post
The Search Engine for OnlyFans Models Who Look Like Your Crush

The Search Engine for OnlyFans Models Who Look Like Your Crush

POPULAR NEWS

Trump ends trade talks with Canada over a digital services tax

Trump ends trade talks with Canada over a digital services tax

June 28, 2025
15 Trending Songs on TikTok in 2025 (+ How to Use Them)

15 Trending Songs on TikTok in 2025 (+ How to Use Them)

June 18, 2025
Communication Effectiveness Skills For Business Leaders

Communication Effectiveness Skills For Business Leaders

June 10, 2025
App Development Cost in Singapore: Pricing Breakdown & Insights

App Development Cost in Singapore: Pricing Breakdown & Insights

June 22, 2025
Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

Comparing the Top 7 Large Language Models LLMs/Systems for Coding in 2025

November 4, 2025

EDITOR'S PICK

What is the Most Reliable HR Software for Mid-Sized Companies?

What is the Most Reliable HR Software for Mid-Sized Companies?

April 26, 2026
Free template and expert tips

Free template and expert tips

September 20, 2025
Downdetector and Speedtest have been sold for over $1 billion

Downdetector and Speedtest have been sold for over $1 billion

March 3, 2026
Bravo’s Garden Party and SEPHORiA 2026

Bravo’s Garden Party and SEPHORiA 2026

March 24, 2026

About

We bring you the best Premium WordPress Themes that perfect for news, magazine, personal blog, etc. Check our landing page for details.

Follow us

Categories

  • Account Based Marketing
  • Ad Management
  • Al, Analytics and Automation
  • Brand Management
  • Channel Marketing
  • Digital Marketing
  • Direct Marketing
  • Event Management
  • Google Marketing
  • Marketing Attribution and Consulting
  • Marketing Automation
  • Mobile Marketing
  • PR Solutions
  • Social Media Management
  • Technology And Software
  • Uncategorized

Recent Posts

  • Is Insider One the Best Braze Alternative for B2C?
  • Google Marketing Live 2026: News and announcements
  • Klaviyo Pricing Guide 2026: Tiers, Add-Ons, and Value
  • LinkedIn Crossclimb Answer Today for May 25, 2026 (Puzzle #755)
  • About Us
  • Disclaimer
  • Contact Us
  • Privacy Policy
No Result
View All Result
  • Technology And Software
    • Account Based Marketing
    • Channel Marketing
    • Marketing Automation
      • Al, Analytics and Automation
      • Ad Management
  • Digital Marketing
    • Social Media Management
    • Google Marketing
  • Direct Marketing
    • Brand Management
    • Marketing Attribution and Consulting
  • Mobile Marketing
  • Event Management
  • PR Solutions