
A recent report by Trend Research has uncovered that NVIDIA’s September 2024 security update for a critical vulnerability (CVE-2024-0132) in the NVIDIA Container Toolkit was incomplete, posing a significant risk to AI infrastructure and data.
The incomplete patch leaves systems vulnerable to container escape attacks. Researchers also discovered a denial-of-service (DoS) vulnerability affecting Docker on Linux. Exploiting these vulnerabilities could allow attackers to access sensitive host data or disrupt operations by exhausting host resources.
Successful exploitation of these vulnerabilities could lead to:
- Unauthorized access to sensitive host data.
- Theft of proprietary AI models or intellectual property.
- Severe operational disruptions.
- Prolonged downtime due to resource exhaustion or system inaccessibility.
Organizations using the NVIDIA Container Toolkit or Docker in AI, cloud, or containerized environments are directly affected, especially those with default configurations or specific toolkit features introduced in recent versions. This includes companies deploying AI workloads or Docker-based container infrastructure.
Analysis of CVE-2024-0132 revealed a time-of-check time-of-use (TOCTOU) vulnerability in the NVIDIA Container Toolkit. This vulnerability allows a specially crafted container to access the host file system. Default configurations for versions 1.17.3 and earlier remain vulnerable, while version 1.17.4 requires the “allow-cuda-compat-libs-from-container” feature to be explicitly enabled. This issue has been disclosed under ZDI-25-087.
A related performance issue can lead to a denial-of-service (DoS) vulnerability on the host machine, affecting Docker on Linux systems. The Docker security team acknowledged that “docker engine has been affected by exponentially grow in mount table record which affect system directly with DOS”. While the Docker API is considered a privileged interface, granting root-level privileges to any user with access, the DoS requires elevated root privileges. The root cause of this issue is still under investigation.
The report outlines a potential attack scenario:
- An attacker creates two malicious container images connected via a volume symlink.
- The attacker runs these images on the victim’s platform.
- This allows the attacker to access the host file system via a race condition.
- The attacker can then access the Container Runtime Unix sockets to execute arbitrary commands with root privileges, gaining full remote control.
To mitigate these vulnerabilities, the report recommends several security best practices:
- Restrict Docker API access and privileges: Limit API access to authorized personnel and avoid granting unnecessary root-level permissions.
- Disable non-essential features: Explicitly disable optional features in NVIDIA Container Toolkit 1.17.4 unless required.
- Implement container image admission controls: Enforce strong admission control policies and automatically scan/block vulnerable container images.
- Monitor the Linux mount table: Regularly inspect for abnormal growth, which can indicate exploitation attempts.
- Regularly audit container-to-host interactions: Limit and monitor filesystem bindings, volume mounts, and socket connections.
- Deploy runtime anomaly detection: Use tools to identify unusual container activities.
- Conduct patch validation: Thoroughly verify security patches to ensure effective mitigation.