Path Traversal at Scale: Study Uncovers 1,756 Vulnerable GitHub Projects and LLM Contamination • Daily CyberSecurity

A study titled “Eradicating the Unseen” reveals the widespread presence of a critical path traversal vulnerability (CWE-22) across open-source projects hosted on GitHub. Spearheaded by researchers from Leiden University and Technical and Vocational University, Iran, the study not only detects but also exploits and remediates this vulnerability using a fully automated pipeline—shedding light on the systemic security risks posed by code reuse and the influence of vulnerable patterns on large language models (LLMs).

“Using our pipeline, we identified 1,756 vulnerable open-source projects, some of which are very influential,” the researchers write. “For many of the affected projects, the vulnerability is critical (CVSS score higher than 9.0).”

The study focuses on a deceptively simple yet dangerous pattern used in Node.js applications to serve static files. This common snippet can be exploited for directory traversal attacks, allowing attackers to access sensitive files such as /etc/passwd or induce denial-of-service conditions by overloading system memory.

Overall flowchart of the proposed pipeline | Credit: researchers

Despite existing for over 15 years, the vulnerable code pattern has been replicated thousands of times across projects, StackOverflow answers, tutorials, and even contaminated the output of major AI coding assistants.

“We traced how the code pattern migrated between different reputable community platforms and developer learning resources,” the paper explains. “Developers voicing security concerns were not supported by the majority… the current LLM chatbots, however, are in a far worse situation.”

To scale the analysis, the authors built a pipeline capable of:

Detecting vulnerable patterns using GitHub code search
Confirming exploitation via Dockerized runtime environments
Auto-patching code using GPT-4
Calculating CVSS scores for each confirmed case
Reporting responsibly to maintainers

Their pipeline led to 1,600 valid patches and 63 publicly fixed repositories. However, many projects remain unpatched due to low maintainer responsiveness or unmaintained codebases.

The research further demonstrated that popular LLMs like ChatGPT, Copilot, and others generate the same vulnerable code pattern—even when explicitly instructed to write secure code. In some test cases:

95% of responses included vulnerable code when asked for a simple static server
70% were still vulnerable even after being asked to make it secure

“Popular LLM chatbots have learned the vulnerable code pattern and can confidently generate insecure code snippets,” the authors warn.

This study delivers a message: even simple copy-pasted code can lead to widespread security risk. Moreover, unless LLMs are retrained with better-curated data, they risk becoming vulnerability propagation engines.

“Our study emphasizes that popular vulnerable code patterns need to be eradicated not only from open-source projects and developers’ resources but also from LLMs.”

As software supply chain threats continue to rise, the study makes a compelling case for scalable, automated vulnerability management, improved developer awareness, and secure code education—especially in the LLM era.

Rate this post

Support Our Threat Intelligence

If you find our CVE report and cybersecurity news helpful, consider supporting our work.

Buy Me a Coffee PayPal

Critical Alert 1 Active Exploit Detected Today

Leave a Reply Cancel reply

Critical Alert 1 Active Exploit Detected Today

Related Posts:

Support Our Threat Intelligence

Related posts:

Leave a Reply Cancel reply