Apache UIMA Java SDK, a popular Java implementation of the UIMA framework, has been found to contain a vulnerability that could allow for arbitrary code execution. This vulnerability, tracked as CVE-2023-39913, arises from the deserialization of untrusted data, specifically Java-serialized CAS objects, without proper input validation.
Apache UIMA (Unstructured Information Management Application) is a framework for developing text analysis applications. It provides a set of APIs and tools for creating, processing, and managing unstructured information, such as text, audio, and video. Apache UIMA Java SDK is a Java implementation of the UIMA framework, making it widely used among Java developers.
The vulnerability stems from the deserialization of Java-serialized CAS objects without verifying the data. CAS (Common Analysis Structure) is a data structure used by UIMA to represent unstructured information. Java serialization is a mechanism for converting Java objects into a format that can be stored and later reconstructed.
When a Java-serialized CAS object is deserialized, the object is recreated from the stored data. If the stored data is malicious, it can be used to execute arbitrary code within the UIMA application. This could allow an attacker to take control of the application or gain unauthorized access to sensitive data.
The CVE-2023-39913 vulnerability affects Apache UIMA Java SDK versions before 3.5.0. Users are strongly advised to upgrade to version 3.5.0, which includes a fix for the vulnerability.
For users running UIMA on a Java version that supports ObjectInputFilters (Java 9 and later), a global or context-specific ObjectInputFilter can be set up to mitigate the vulnerability. This filter can be configured to allow deserialization of only trusted classes, preventing the execution of malicious code.