CVE-2023-47248 – PyArrow Arbitrary Code Execution Vulnerability: A Critical Threat to Data Analysts
Apache Arrow is a popular data processing framework used by data scientists and engineers around the world. It provides a high-performance columnar memory format for storing and processing data, and it has first-class integration with NumPy, pandas, and other popular Python libraries.
However, a recently discovered vulnerability in PyArrow, the Python bindings for Apache Arrow, could allow attackers to execute arbitrary code on vulnerable systems. This vulnerability, CVE-2023-47248, has been rated as critical by the Common Vulnerabilities and Exposures (CVE) database.
The vulnerability exists in the way that PyArrow deserializes untrusted data. When PyArrow reads Arrow IPC, Feather, or Parquet data from an untrusted source, such as a user-supplied input file, it could be tricked into executing malicious code.
This vulnerability is particularly concerning for data scientists, who often work with data from untrusted sources. For example, data scientists may need to analyze data from public datasets or data that is provided by external collaborators.
If an attacker can exploit this vulnerability, they could potentially gain full control of a vulnerable system. This could allow them to steal data, install malware, or launch other attacks.
The CVE-2023-47248 flaw lies within the deserialization process of IPC and Parquet readers in PyArrow versions 0.14.0 to 14.0.0. In simple terms, if an application using PyArrow processes a maliciously crafted data file from an untrusted source, it could allow an attacker to run arbitrary code on the system where the application is executed.
If you are using PyArrow, it is important to upgrade to version 14.0.1 as soon as possible. This patch fixes the vulnerability.
If you cannot upgrade to PyArrow 14.0.1, you can use the pyarrow-hotfix package to disable the vulnerability on older versions of PyArrow. However, this is not a permanent solution, and you should upgrade to PyArrow 14.0.1 as soon as possible.