A newly disclosed high-severity vulnerability in vLLM—one of the fastest-growing open-source inference engines for large language models—allows attackers to crash servers or potentially execute arbitrary code simply by submitting malicious prompt embeddings. The flaw, tracked as CVE-2025-62164 and rated CVSS 8.8, affects vLLM versions 0.10.2 and later, putting numerous AI deployments and LLM-powered applications at significant risk.
According to the advisory, “A memory corruption vulnerability that leading to a crash (denial-of-service) and potentially remote code execution (RCE) exists in vLLM versions 0.10.2 and later, in the Completions API endpoint.”
The vulnerability stems from insufficient validation when deserializing user-supplied embeddings inside the Completions API. The affected code loads tensors with: torch.load(tensor, weights_only=True).
But as the advisory warns, “Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled by default… Maliciously crafted tensors can bypass internal bounds checks and trigger an out-of-bounds memory write during the call to to_dense().”
This out-of-bounds write is what makes the vulnerability so dangerous—leading not only to server crashes but possible arbitrary code execution.
The patch has been merged under #27204.
As AI infrastructure continues to scale, supply-chain weaknesses and low-level tensor manipulation flaws will become increasingly attractive targets. Organizations running vLLM should upgrade immediately and audit any external-facing model-serving interfaces.
Related Posts:
- Critical CVSS 9.8 RCE Flaw in vLLM Exposes AI Hosts to Remote Attacks
- Critical Remote Code Execution Vulnerability in vLLM via Mooncake Integration
- CVE-2025-32444 (CVSS 10): Critical RCE Flaw in vLLM’s Mooncake Integration Exposes AI Infrastructure
- AI’s New Attack Vector: How Real-Time Bots Are Straining Websites
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.