Apache Spark is a widely used, unified analytics engine designed for large-scale data processing. It offers high-level APIs in various languages, such as Scala, Java, Python, and R, and is optimized for general computation graphs for data analysis. With a diverse range of higher-level tools, like Spark SQL, pandas API on Spark, MLlib, GraphX, and Structured Streaming, it’s no wonder that Spark has become an indispensable tool for many organizations.
However, like all software, Spark can also have security vulnerabilities. Recently, a new security flaw has been discovered, dubbed CVE-2023-22946.
CVE-2023-22946 affects Apache Spark versions prior to 3.4.0. The vulnerability arises when applications use the spark-submit feature to specify a ‘proxy-user’ to run as, which limits privileges. This is often the case when architectures rely on proxy-user, such as those using Apache Livy to manage submitted applications.
The flaw allows an attacker to execute code with the privileges of the submitting user by providing malicious configuration-related classes on the classpath. This can lead to the escalation of privileges and, ultimately, a compromise of the affected system.
Credit for discovering CVE-2023-22946 goes to Hideyuki Furue, who identified the vulnerability and reported it to the Apache Spark team.
To protect your system from this vulnerability, follow these steps:
- Update to Apache Spark 3.4.0 or later: The vulnerability is present in versions of Apache Spark prior to 3.4.0. Updating to the latest version will ensure you’re not exposed to this security flaw.
- Set spark.submit.proxyUser.allowCustomClasspathInClusterMode to “false”: Make sure this configuration is set to its default value of “false” and not overridden by submitted applications. This will prevent the vulnerability from being exploited.