AWS has announced the launch of its Amazon EC2 Trn3 UltraServers, powered by its new 3 nm AI chip, Trainium3, aiming to resolve the mounting cost and compute bottlenecks faced by modern AI training and inference workloads. According to official metrics, Trainium3 delivers up to 4.4× the performance of the previous generation while improving energy efficiency by 40%, with AWS positioning the platform at a more competitive price point to make large-scale AI model training infrastructure accessible to a broader range of enterprises.
At the heart of the EC2 Trn3 UltraServer is its highly integrated architecture: a single system can host up to 144 Trainium3 chips, offering 362 FP8 PFLOPs of AI compute, and is available starting today.
To address the communication bottlenecks that plague distributed computing, AWS has introduced the new NeuronSwitch-v1 along with an enhanced Neuron Fabric interconnect, reducing chip-to-chip latency to below 10 microseconds (μs)—a critical improvement for data-intensive Agentic AI workloads and Mixture-of-Experts (MoE) architectures.
AWS further notes that through its EC2 UltraClusters 3.0 design, customers can interconnect thousands of UltraServers, scaling up to ultra-large clusters containing one million Trainium chips, a tenfold increase over the previous generation.
In real-world deployment examples, AWS highlighted multiple customer successes:
- Decart, a company specializing in generative AI video, reported 4× faster inference for real-time video generation on Trainium3 at half the cost of GPU-accelerated systems.
- Customers such as Anthropic, Karakuri, and Ricoh have similarly cut training and inference costs by up to 50% using Trainium-based compute.
AWS also revealed new progress on its collaborative initiative with Anthropic, Project Rainier, which now links more than 500,000 Trainium2 chips, making it one of the largest AI compute clusters in the world—five times larger than the infrastructure Anthropic used to train its previous-generation model.
Looking ahead, AWS confirmed that it is already developing the next-generation Trainium4 chip. Expected improvements include 6× higher performance (in FP4 operations), 4× greater memory bandwidth, and 2× expanded memory capacity, coupled with support for NVIDIA’s NVLink Fusion high-speed interconnect.
This means that future Trainium4 and Graviton processors will be able to operate seamlessly alongside NVIDIA GPUs within a unified MGX rack architecture, dissolving the longstanding divide between in-house silicon and the GPU ecosystem, and giving customers a far more flexible and heterogeneous compute landscape.
Related Posts:
- New Phishing Campaign Targets AWS Accounts: Security Experts Warn
- Apple Confirmed that All Mac and iOS Devices Are Affected by Chip Vulnerability
- LockBit Imposter: New Ransomware Leverages AWS for Attacks