
A critical security flaw has been identified in Apache Parquet Java, a popular open-source columnar storage format widely used in data-intensive applications and analytics pipelines. Tracked as CVE-2025-46762, the vulnerability affects the parquet-avro module and exposes systems to the risk of arbitrary code execution when processing malicious Avro schemas embedded in Parquet file metadata.
Apache Parquet is a column-oriented file format that offers efficient data compression and encoding schemes, making it a core component in the big data ecosystem. It is heavily utilized by frameworks like Apache Spark, Hive, and Flink and is supported across multiple programming languages including Java, Python, and C++.
The flaw affects all versions of Apache Parquet Java up to and including 1.15.1. According to the advisory: “Schema parsing in the parquet-avro module of Apache Parquet 1.15.0 and previous versions allows bad actors to execute arbitrary code.”
At the heart of the issue is how Avro schemas are deserialized from metadata stored in Parquet files. When the “specific” or “reflect” model is used for schema resolution, the system may inadvertently deserialize objects from trusted Java packages, potentially allowing an attacker to trigger remote code execution (RCE).
While version 1.15.1 introduced restrictions on untrusted packages, the default list of trusted packages remains permissive, meaning that threat actors could still exploit this vulnerability using classes from whitelisted packages.
“The exploit is only applicable if the client code of parquet-avro uses the ‘specific’ or the ‘reflect’ models deliberately for reading Parquet files. (‘generic’ model is not impacted),” the advisory notes.
This vulnerability is not exploitable by default but can be triggered under specific usage patterns where:
- Applications use parquet-avro to read Parquet files,
- They employ the specific or reflective Avro deserialization models,
- And they process untrusted or user-supplied Parquet files.
If all three conditions are met, a specially crafted Parquet file could include a malicious Avro schema that results in unauthorized code execution during schema parsing.
Apache recommends two paths for mitigation:
- Upgrade to Apache Parquet Java 1.15.2, which includes hardened default settings to prevent execution from trusted but potentially dangerous packages.
- For users on 1.15.1, explicitly set the system property: