The Apache Tika toolkit, the industry standard for detecting and extracting metadata from over a thousand file types, has issued a maximum-severity alert. A critical XML External Entity (XXE) vulnerability, tracked as CVE-2025-66516, poses a catastrophic risk to applications relying on Tika for content analysis. With a CVSS score of 10.0, this flaw allows attackers to compromise servers simply by uploading a malicious PDF.
The vulnerability stems from how the toolkit handles XFA (XML Forms Architecture) data within PDF files. As stated in the advisory: “Critical XXE in Apache Tika… allows an attacker to carry out XML External Entity injection via a crafted XFA file inside of a PDF.”
This advisory is particularly notable because it serves as a crucial correction to a previous disclosure (CVE-2025-54988). The original report underestimated the blast radius of the vulnerability.
The Apache team realized that patching just the PDF module was insufficient. “While the entrypoint for the vulnerability was the tika-parser-pdf-module as reported in CVE-2025-54988, the vulnerability and its fix were in tika-core.”
This means many administrators who thought they were safe after the last patch cycle may still be exposed. The report starkly warns: “Users who upgraded the tika-parser-pdf-module but did not upgrade tika-core to >=3.2.2 would still be vulnerable.”
This vulnerability impacts the core engine of Tika, meaning the scope is wider than initially thought. It affects legacy 1.x parsers as well as modern modules.
Affected components include:
- Apache Tika Core: Versions 1.13 through 3.2.1
- Apache Tika Parsers: Versions 1.13 before 2.0.0
- Apache Tika PDF Parser Module: Versions 2.0.0 through 3.2.1
The Apache Tika toolkit is designed to be a universal parser. All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more.
However, this universality becomes a liability when parsing complex formats like PDF. By embedding a malicious XML payload within the XFA section of a PDF, an attacker can trick the Tika core into processing external entities. This can lead to the disclosure of confidential data, denial of service, or server-side request forgery (SSRF).
The confusion regarding the previous CVE makes this update mandatory. If your application dependencies include tika-core version 3.2.1 or older, you are at risk.
You must upgrade your dependencies immediately to ensure tika-core is at version 3.2.2 or higher.