The Rubin Era: NVIDIA’s Next-Gen AI Factory Chips Enter Mass Production • Daily CyberSecurity

Following the Blackwell architecture, NVIDIA formally announced at CES 2026 that its next-generation AI computing platform, codenamed “Rubin,” has entered full-scale mass production. NVIDIA CEO Jensen Huang emphasized that Rubin was conceived to meet the demands of the next generation of AI factories—particularly for complex workloads such as Agentic AI, Mixture-of-Experts (MoE) models, and long-context reasoning. Through what NVIDIA calls “Extreme Codesign,” the Rubin platform is able to reduce the token generation cost of AI inference by as much as tenfold.

At the heart of the Rubin platform lies a suite of six newly designed chips, with the Rubin GPU and Vera CPU commanding particular attention. The Rubin GPU is manufactured using TSMC’s 3nm process and incorporates a third-generation Transformer Engine. It delivers up to 50 PFLOPS of NVFP4 AI inference performance—five times that of the previous Blackwell architecture—while training performance sees a 3.5× improvement.

The Vera CPU was purpose-built to complement these powerful GPUs. NVIDIA describes it as an Arm-based CPU optimized specifically for AI inference, featuring 88 custom Olympus cores. Compared with the earlier Grace CPU, Vera doubles performance and offers up to 1.2 TB/s of memory bandwidth, enabling far more efficient handling of massive data throughput. To ensure seamless cooperation among these components, NVIDIA introduced the NVLink 6 Switch, providing up to 3.6 TB/s of bandwidth per GPU—an essential capability for training large-scale MoE models. On the networking front, the platform is supported by the ConnectX-9 SuperNIC and Spectrum-6 Ethernet switches, delivering end-to-end connectivity of up to 800 Gb/s to keep data flowing rapidly across AI factories.

Alongside the new silicon, NVIDIA’s DGX SuperPOD supercomputer architecture has also been updated for the Rubin era.

DGX Vera Rubin NVL72: A rack-scale solution engineered for extreme performance. A single rack integrates eight systems, comprising a total of 576 Rubin GPUs and 36 Vera CPUs. Interconnected via NVLink 6, these 576 GPUs operate as a single massive GPU with a unified memory space, making the system particularly well suited for ultra-large models.
DGX Rubin NVL8: Designed for enterprises requiring flexible deployment, the NVL8 retains a more compact, liquid-cooled form factor and pairs Rubin GPUs with x86 CPUs, enabling easier integration into existing environments. To address the KV cache bottleneck encountered during large-model inference, NVIDIA introduced the Inference Context Memory Storage Platform based on the BlueField-4 DPU. This technology allows multiple GPUs to share context memory at high speed, boosting inference throughput and energy efficiency by up to five times.

On the security front, the Rubin platform integrates solutions from partners such as Armis, Check Point, and F5, delivering hardware-accelerated, real-time protection via BlueField DPUs to safeguard AI workloads.

The NVIDIA Rubin platform has already garnered broad industry support. Major cloud providers—including Microsoft, AWS, Google Cloud, and Oracle—have announced plans to adopt Rubin-based systems. Microsoft, in particular, will deploy Vera Rubin NVL72 systems in its next-generation “Fairwater” AI super-factory, while AI compute specialist CoreWeave is also among the first wave of adopters.

Support Our Threat Intelligence

If you find our CVE report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal

Written by

@DdoS · Security Researcher

Do Son

Do Son is the Founder and Editor of SecurityOnline.info. Working in cybersecurity since 2013, he reports on vulnerabilities, malware, and emerging threats, providing timely analysis to help organizations and individuals stay ahead of evolving risks.

Related Posts:

Get Zero-Hour Vulnerability Alerts

Support Our Threat Intelligence

Do Son

Leave a Reply Cancel reply