
A critical security vulnerability has been disclosed in vLLM, a popular open-source library used for high-performance inference and serving of large language models (LLMs). Tracked as CVE-2025-32444, this vulnerability carries the highest possible CVSS score of 10.0, signifying a severe remote code execution (RCE) risk for deployments using its Mooncake integration.
With over 46,000 stars on GitHub, vLLM is a widely adopted LLM serving library trusted for its speed and flexibility across academic, research, and enterprise-grade AI systems. As LLM-based tools proliferate across industries, security within the model serving stack is critical.
The flaw lies in how vLLM’s Mooncake integration handles serialized data across the network, using Python’s pickle module over unsecured ZeroMQ sockets.
The problem is specifically located in the recv_pyobj() function within the vllm/vllm/distributed/kv_transfer/kv_pipe/mooncake_pipe.py file. This function implicitly uses pickle.loads() to process incoming data over the ZeroMQ sockets.
This vulnerability impacts all vLLM instances that actively utilize the Mooncake integration in versions greater than or equal to 0.6.5. If your vLLM deployment does not leverage Mooncake for distributed KV transfer, you are not susceptible to this specific vulnerability.
The vLLM team has swiftly addressed this critical issue. The patched version, v0.8.5, is now available. It is paramount that all affected vLLM deployments are upgraded to this version immediately to mitigate the risk of remote code execution.