Following the Blackwell architecture, NVIDIA formally announced at CES 2026 that its next-generation AI computing platform, codenamed “Rubin,” has entered full-scale mass production. NVIDIA CEO Jensen Huang emphasized that Rubin was conceived to meet the demands of the next generation of AI factories—particularly for complex workloads such as Agentic AI, Mixture-of-Experts (MoE) models, and long-context reasoning. Through what NVIDIA calls “Extreme Codesign,” the Rubin platform is able to reduce the token generation cost of AI inference by as much as tenfold.
At the heart of the Rubin platform lies a suite of six newly designed chips, with the Rubin GPU and Vera CPU commanding particular attention. The Rubin GPU is manufactured using TSMC’s 3nm process and incorporates a third-generation Transformer Engine. It delivers up to 50 PFLOPS of NVFP4 AI inference performance—five times that of the previous Blackwell architecture—while training performance sees a 3.5× improvement.
The Vera CPU was purpose-built to complement these powerful GPUs. NVIDIA describes it as an Arm-based CPU optimized specifically for AI inference, featuring 88 custom Olympus cores. Compared with the earlier Grace CPU, Vera doubles performance and offers up to 1.2 TB/s of memory bandwidth, enabling far more efficient handling of massive data throughput. To ensure seamless cooperation among these components, NVIDIA introduced the NVLink 6 Switch, providing up to 3.6 TB/s of bandwidth per GPU—an essential capability for training large-scale MoE models. On the networking front, the platform is supported by the ConnectX-9 SuperNIC and Spectrum-6 Ethernet switches, delivering end-to-end connectivity of up to 800 Gb/s to keep data flowing rapidly across AI factories.
Alongside the new silicon, NVIDIA’s DGX SuperPOD supercomputer architecture has also been updated for the Rubin era.
- DGX Vera Rubin NVL72: A rack-scale solution engineered for extreme performance. A single rack integrates eight systems, comprising a total of 576 Rubin GPUs and 36 Vera CPUs. Interconnected via NVLink 6, these 576 GPUs operate as a single massive GPU with a unified memory space, making the system particularly well suited for ultra-large models.
- DGX Rubin NVL8: Designed for enterprises requiring flexible deployment, the NVL8 retains a more compact, liquid-cooled form factor and pairs Rubin GPUs with x86 CPUs, enabling easier integration into existing environments. To address the KV cache bottleneck encountered during large-model inference, NVIDIA introduced the Inference Context Memory Storage Platform based on the BlueField-4 DPU. This technology allows multiple GPUs to share context memory at high speed, boosting inference throughput and energy efficiency by up to five times.
On the security front, the Rubin platform integrates solutions from partners such as Armis, Check Point, and F5, delivering hardware-accelerated, real-time protection via BlueField DPUs to safeguard AI workloads.
The NVIDIA Rubin platform has already garnered broad industry support. Major cloud providers—including Microsoft, AWS, Google Cloud, and Oracle—have announced plans to adopt Rubin-based systems. Microsoft, in particular, will deploy Vera Rubin NVL72 systems in its next-generation “Fairwater” AI super-factory, while AI compute specialist CoreWeave is also among the first wave of adopters.
Related Posts:
- NVIDIA Q1 Revenue Soars to $44.1 Billion Amid AI Boom, Blackwell Adoption
- NVIDIA Unveils First Blackwell Wafer Made in US at TSMC Arizona Fab, Marking Production Milestone
- The Desk-Side Revolution: NVIDIA’s DGX Spark Update Delivers 2.5× AI Speed Boost
- CISA Flags Two Actively Exploited Vulnerabilities: TP-Link Router Reset Flaw and WhatsApp Zero-Day Chain