The architecture of PyPitfall | Image: The researchers
A study from the New Jersey Institute of Technology has exposed a massive web of hidden vulnerabilities lurking deep within Pythonβs package ecosystem. Titled βPyPitfall,β the paper reveals how complex and deeply nested package dependencies are silently propagating known security flaws across thousands of projects.
βVulnerabilities in one package can propagate through its dependencies, potentially affecting downstream packages and applications,β the researchers warned.
Pythonβs rise as one of the most popular programming languages is fueled by its rich ecosystem of open-source libraries hosted on PyPI (Python Package Index). With over 627,000 packages and more than 6 million releases, developers often import functionality without realizing the security baggage that comes along.
The PyPitfall study analyzed the dependencies of 378,573 packages and found that:
- 4,655 packages explicitly require known-vulnerable versions (Guaranteed Exposure).
- 141,044 packages allow installation of potentially vulnerable versions (Potential Exposure).
βAny successful installation of [these packages] will inevitably result in a vulnerable version [β¦] being installed,β the authors explain in their definition of Guaranteed Exposure.
The study highlighted extreme complexity in the PyPI ecosystem, including packages with over 20 layers of transitive dependencies and 1 million+ circular dependencies. One notable case was a package (square-0-5) that ended up depending on itself after 75 recursive jumps, causing pip install to enter an infinite loop.
βThe chain of dependencies can be long and complex. A single package may depend on hundreds of others, forming a deep software supply chain labyrinth.β
These deeply nested relationships often mask the presence of vulnerable components, allowing them to remain hidden even as they spread widely.
To uncover the vulnerabilities, the team used a tool called Johnnydep to perform dry-run installations of PyPI packages, collecting their dependency trees without installing them. They cross-referenced these trees against a curated set of 67 known CVEs, focusing only on those that affected PyPI-hosted libraries.
This approach helped them discover severe vulnerabilities in widely used packages like urllib3, which alone was responsible for 41.4% of all Guaranteed Exposures found.
Urllib3, a core component in Pythonβs HTTP stack and a dependency of requests, was highlighted as a high-impact example:
- Found in 407,333 dependency chains.
- Introduced vulnerabilities such as CVE-2024-37891 and CVE-2023-43804.
- Caused 1,926 packages to have guaranteed exposures.
Key Insights
- Average Depth of Vulnerability Exposure: 4.1 (Guaranteed), 6.2 (Potential)
- Setuptools is the most depended-upon package, appearing in over 7 million dependency chains.
- 1,075,559 circular dependencies were detected, further complicating vulnerability tracking.
- Many packages fail to follow PEP 440 versioning consistently, introducing parsing issues during resolution.
The study calls for:
- Improved tooling to audit dependencies before installation.
- Better awareness of transitive vulnerabilities during development.
- A broader scan of PyPI using a more comprehensive CVE dataset in future research.
The authors have disclosed their findings to the Python Packaging Authority, emphasizing the urgency of addressing this systemic risk in Pythonβs open-source ecosystem.
βOur findings underscore the need for enhanced security awareness in the Python software supply chain,β the report concludes.
Related Posts:
- RIG Exploit Kit use the PROPagate injection technology to spread Monero miners
- IBM X-Force Uncovers Stealthy Gootloader Variant “GootBot”
- Fighting AI Crawlers: Cloudflare Unleashes the AI Labyrinth
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.