The widely used Apache Tika toolkit, a powerful library for detecting and extracting metadata and text from over a thousand file formats, has been found vulnerable to a critical XML External Entity (XXE) injection flaw. The vulnerability, tracked as CVE-2025-54988, impacts the toolkit’s PDF parser module and could allow attackers to access sensitive data or pivot into internal networks.
Apache Tika’s PDF parser module (org.apache.tika:tika-parser-pdf-module) is responsible for processing PDF documents, including those containing XML Forms Architecture (XFA). According to the advisory, the flaw arises from improper handling of crafted XFA data within PDF files.
When exploited, this weakness enables attackers to perform XXE injection, a technique where maliciously crafted XML entities are used to:
- Read sensitive files on the host system.
- Trigger server-side request forgery (SSRF), sending malicious requests to internal resources.
- Exfiltrate data to attacker-controlled servers.
The issue affects versions 1.13 through 3.2.1 of the PDF parser module.
The vulnerability has been rated critical, given Apache Tika’s widespread use in applications such as:
- Search engine indexing
- Content analysis pipelines
- Automated translation systems
- Enterprise document processing
What makes the issue particularly concerning is its reach: the vulnerable tika-parser-pdf-module is used as a dependency in several other Tika packages, including:
- tika-parsers-standard-modules
- tika-parsers-standard-package
- tika-app
- tika-grpc
- tika-server-standard
This means that many organizations relying on Tika for large-scale document ingestion and analysis may unknowingly be exposed.
The Apache Tika team has released version 3.2.2, which includes a patch that resolves the XXE injection flaw. Users and organizations are strongly advised to upgrade immediately to mitigate the risk of exploitation.
Related Posts:
- PyPI Rejects Malicious ZIP Archives to Block “Parser Confusion” Attacks
- UNC6040 Threat Actor Exploits Salesforce via Vishing and Malicious Data Loader Apps
- CVE-2023-36475: Parse Server Remote Code Execution Vulnerability
- Digigram PYKO-OUT AoIP Devices Exposed to Attacks Due to Missing Default Password
- Apache bRPC Flaw (CVE-2025-54472) Allows Remote Denial-of-Service Attack
Support Our Threat Intelligence
If you find our CVE report and cybersecurity news helpful, consider supporting our work.