A critical remote code execution (RCE) vulnerability has been uncovered in SGLang, a popular open-source framework used to serve advanced models like DeepSeek and Mistral. The flaw, officially tracked as CVE-2026-5760, carries a CVSS score of 9.8, signaling a “Critical” risk level for AI infrastructure.
The vulnerability resides specifically within SGLang’s reranking endpoint (/v1/rerank), which is designed to sort documents based on their relevance to a search query.
The attack chain begins when a threat actor creates a “malicious GPT Generated Unified Format (GGUF) model file”. By crafting a specific metadata field known as the tokenizer.chat_template—which defines how text is structured before processing—the attacker can embed a Jinja2 server-side template injection (SSTI) payload.
According to the vulnerability note, “The victim then downloads and loads the model in SGLang, and when a request hits the /v1/rerank endpoint, the malicious template is rendered, executing the attacker’s arbitrary Python code on the server”.
At the technical level, the issue is traced back to a single function: get_jinja_env(). This function is responsible for setting up the environment that renders chat templates.
The security breakdown occurs because the framework utilizes jinja2.Environment() without any form of sandboxing. Because it “fails to restrict the execution of arbitrary Python code,” any malicious model loaded into the service can effectively take full control of the underlying host.
The potential fallout from a successful exploitation is severe. Attackers could leverage this RCE primitive for:
- Host Compromise: Gaining full access to the server running the AI service.
- Lateral Movement: Using the compromised server as a beachhead to attack other systems in the network.
- Data Exfiltration: Stealing sensitive weights, training data, or user queries.
- Denial-of-Service (DoS): Shutting down critical AI infrastructure.
Nervously for the community, the advisory notes that “no response was obtained from the project maintainers during coordination,” meaning a formal patch is not yet available through official channels.
Until an official update is released, security researchers recommend a manual fix for those self-hosting SGLang. The core recommendation is to “use ImmutableSandboxedEnvironment instead of jinja2.Environment() to render the chat templates”. This change creates a restricted execution environment that prevents the rendering engine from running arbitrary Python commands.
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.