pcodedmp.py – A VBA p-code disassembler
It is not widely known, but macros are written in VBA (Visual Basic for Applications; the macro programming language used in Microsoft Office) exist in three different executable forms, each of which can be what is actually executed at run time, depending on the circumstances. These forms are:
- Source code. The original source code of the macro module is compressed and stored at the end of the module stream. This makes it relatively easy to locate and extract and most free DFIR tools for macro analysis like oledump or olevba or even many professional anti-virus tools look only at this form. However, most of the time the source code is completely ignored by Office. In fact, it is possible to remove the source code (and therefore make all these tools think that there are no macros present), yet the macros will still execute without any problems. I have created a proof of concept illustrating this. Most tools will not see any macros in the documents in this archive it but if opened with the corresponding Word version (that matches the document name), it will display a message and will launch
calc.exe
. It is surprising that malware authors are not using this trick more widely. - P-code. As each VBA line is entered into the VBA editor, it is immediately compiled into p-code (a pseudo code for a stack machine) and stored in a different place in the module stream. The P-code is precisely what is executed most of the time. In fact, even when you open the source of a macro module in the VBA editor, what is displayed is not the decompressed source code but the p-code decompiled into the source. Only if the document is opened under a version of Office that uses a different VBA version from the one that has been used to create the document, the stored compressed source code is re-compiled into p-code and then that p-code is executed. This makes it possible to open a VBA-containing document on any version of Office that supports VBA and have the macros inside remain executable, despite the fact that the different versions of VBA use different (incompatible) p-code instructions.
- Execodes. When the P-code has been executed at least once, a further tokenized form of it is stored elsewhere in the document (in streams, the names of which begin with
__SRP_
, followed by a number). From there it can be executed much faster. However, the format of the execodes is extremely complex and is specific for the particular Office version (not VBA version) in which they have been created. This makes them extremely non-portable. In addition, their presence is not necessary – they can be removed and the macros will run just fine (from the p-code).
Since most of the time it is the p-code that determines what exactly a macro would do (even if neither source code, nor execodes are present), it would make sense to have a tool that can display it. This is what prompted us to create this VBA p-code disassembler.
Changelog v1.2.6
- Changed it not to require the win_unicode_console module when it is not available – e.g., when not running on a Windows machine or when running under the PyPy implementation of Python, thanks to Philippe Lagadec.
Installation
pip install pcodedmp -U
or
git clone https://github.com/bontchev/pcodedmp.git
cd pcodedmp
pip install .
Usage
The script takes as a command-line argument a list of one or more names of files or directories. If the name is an OLE2 document, it will be inspected for VBA code and the p-code of each code module will be disassembled. If the name is a directory, all the files in this directory and its subdirectories will be similarly processed. In addition to the disassembled p-code, by default, the script also displays the parsed records of the dir
stream, as well as the identifiers (variable and function names) used in the VBA modules and stored in the _VBA_PROJECT
stream.
The script supports VBA5 (Office 97, MacOffice 98), VBA6 (Office 2000 to Office 2009) and VBA7 (Office 2010 and higher).
Copyright (C) bontchev
Source: https://github.com/bontchev/