NVIDIA has released urgent software updates to address a set of critical vulnerabilities discovered in its popular Triton Inference Server, a widely used open-source AI serving platform. The flaws, reported in collaboration with Wiz Research, expose systems to remote code execution (RCE), data tampering, and denial-of-service (DoS) attacks — all without requiring user interaction or authentication.
At the heart of the discovery lies a three-stage vulnerability chain, assigned CVE-2025-23319, CVE-2025-23320, and CVE-2025-23334, that allows remote attackers to seize control of Triton servers by exploiting a misconfigured shared memory system in the Python backend.
In their technical breakdown, Wiz Research explains how the exploit chain begins with a seemingly minor error-handling flaw in the Python backend. By submitting a malformed request, attackers can provoke an error message that leaks the unique name of the backend’s internal shared memory region:
“The returned error message appears as follows: {‘error’:’Failed to increase the shared memory pool size for key ‘triton_python_backend_shm_region_4f50c226-b3d0-46e8-ac59-d4690b28b859’…’}“
This leak sets the stage for a deeper compromise. Using Triton’s legitimate shared memory APIs — intended to optimize inference performance — attackers can register and manipulate the backend’s private memory space, bypassing internal isolation mechanisms.
“This provides the attacker with powerful read and write primitives into the Python backend’s private memory… performed through standard, legitimate API calls,” Wiz Research explains.
The final stage involves exploiting this memory access to execute arbitrary code. By corrupting internal data structures, injecting malicious messages into the Inter-Process Communication (IPC) queue, or triggering out-of-bounds memory accesses, an attacker could achieve full remote control of the server.
NVIDIA confirmed multiple vulnerabilities in its bulletin — with three scoring a CVSS 9.8 (“Critical”), indicating maximum exploitability and impact:
- CVE-2025-23310: Stack buffer overflow through crafted input, leading to RCE, DoS, info disclosure
- CVE-2025-23311: Stack overflow via HTTP requests
- CVE-2025-23317: Remote shell via crafted HTTP request
Each of these can be triggered remotely, requires no authentication, and poses a serious threat to any organization deploying AI/ML pipelines with Triton.
NVIDIA explicitly states:
“A successful exploit of this vulnerability might lead to remote code execution, denial of service, information disclosure, and data tampering.”
Triton is frequently deployed in production AI environments, including in data centers, edge servers, and enterprise inference pipelines. An attacker exploiting these vulnerabilities could:
- Steal proprietary AI models
- Alter inference results, compromising trust in machine learning outputs
- Exfiltrate sensitive training data
- Use the compromised server as a foothold into internal networks
Nvidia has responded swiftly, releasing patches across three Triton versions:
- 25.05: Fixes CVE-2025-23323 to CVE-2025-23327, and CVE-2025-23335
- 25.06: Fixes CVE-2025-23322 and CVE-2025-23331
- 25.07: Fixes CVE-2025-23310, CVE-2025-23311, CVE-2025-23317, and CVE-2025-23318
You can access the updates on the Triton Inference Server GitHub page.
Related Posts:
- Python-Powered Triton RAT Exfiltrates Data via Telegram and Evades Analysis
- PoC Published for Critical Nvidia Triton Inference Server Vulnerabilities
- CVE-2024-0087: NVIDIA Releases Security Patch for Critical Flaw in Triton Inference Server
- Red Hat & AMD Deepen AI Partnership: Optimizing AI and Virtualization
- Red Hat Unveils llm-d: Scaling Generative AI for the Enterprise
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.