Static Malware Analysis?
Basic static analysis consists of examining the Document without viewing the actual instructions. Basic static analysis can confirm whether a file is malicious, provide information about its functionality, and sometimes provide information that will allow you to produce simple network signatures. Basic static analysis is straightforward and can be quick, but it is largely ineffective against sophisticated malware, and it can miss important behaviors.
Microsoft Office Document
Office documents will continue to be the most common methods used by attackers to trick users and execute malicious activity. Office documents can contain what are commonly known as macros – embedded program code is written in the Visual Basic for Applications programming language, or VBA for short.
The problem with macros is that the term sounds safe and innocent, but VBA today is as powerful and as dangerous as C, C++, Delphi, Perl, Python, or any other programming language that’s associated with full-blown, standalone applications that you install and run locally.
VBA needs an Office application running (usually Word, Excel, or PowerPoint) to make it work, but once you agree to let VBA code run from inside an Office file, it has full access to your computer just as if the VBA program were running outside Office.
There are loads of open source tools out there that can be used to perform static analysis, today we are going to be looking at a few of them:
OLEtools
Exiftool
pcodedmp
pdfid
pdf-parser
pdftk
What is OLEtools?
OLE Tools is a Python package used to analyze Microsoft Office documents. Malicious VBA scripts within Macros embedded in Office documents is a common malware distribution technique, so OLE Tools is a very useful tool to have in your toolkit.
There are multiple tools and techniques to investigate files statically. Buy, in order to demonstrate the fastest way to analyze Office documents, OLEtools is the most recommended tool.
Investigating malicious Office file with OLEtools
Now, we can run the first command to obtain metadata around the file. Running this command is going to return some useful metadata around the document –
exiftool sechive.dotm
As we can see, this file contains macro commands which we need to investigate. Let's try to extract those macros out of the document –
olevba sechive.dotm
What is pcodedmp?
Pcodedmp is a python based tool that allows researchers to inspect for VBA code and the p-code inside an office document.
Investigating malicious Office file with pcodedump
Now, we can run the first command to obtain metadata around the file. Running this command is going to return disassembled p-code from the document –
pcodedmp -d sechive.dotm
Portable Document Format (PDF)
Malicious PDF files recently considered one of the most dangerous threats to system security. The flexible code-bearing vector of the PDF format enables the attacker to carry out malicious code on the computer system for user exploitation.
A PDF file can describe documents that contain text, graphics, and images in a device-independent format and resolution. A PDF document can be defined as a collection of objects which describe how one or more pages must be displayed.
This collection of objects can also consider additional interactive components and application data at a higher level.
PDF document consists of four main parts.
One-line header
Body
Cross-reference table
Trailer
Risky PDF Format Tags:
/OpenAction and /AA specify the script or action to run automatically.
/JavaScript and /JS specify JavaScript to run.
/GoTo changes the view to a specified destination within the PDF or in another PDF file.
/Launch can launch a program or open a document.
/URI accesses a resource by its URL.
/SubmitForm and /GoToR can send data to URL.
/RichMedia can be used to embed Flash in a PDF.
/ObjStm can hide objects inside an Object Stream.
So how we can find the notorious tags?
What is pdfid?
Pdfif will scan a file to look for certain PDF keywords, allowing you to identify PDF documents that contain (for example) JavaScript or execute an action when opened. Pdfid is the tool that will help us to list the risky PDF format tags described above.
Investigating malicious Office file with pdfid
Pdfid.py sechive.pdf
As we can see, the tool returns us information about the existence of 2 /JavaScript tags inside the suspicious PDF. Another interesting point of view is the /OpenAction which indicates an automatic action will be performed when the document is viewed.
What is pdf-parser?
Pdfif will parse a PDF document to identify the fundamental elements used in the analyzed file. The raw option makes pdf-parser output raw data in order to investigate the data inside the PDF document conviniently.
Investigating malicious PDF file with pdf-parser
Pdf-parser.py –raw sechive.pdf
So, let's go step by step to understand each structure and try to find JavaScript (possibly malicious) within that PDF.
Object 1 is the root, objects 2 and 3 are children and so on, this information is contained in the Trailer structure as mention before. Looking the all objects within the PDF we can find another reference, the Obj 5 which contains JavaScript and leads us to the last tool.
What is pdftk?
Pdftk is a cross-platform tool for manipulating Portable Document Format (PDF) documents. It comes in three versions: PDFtk Server (open-source command-line tool), PDFtk Free, and PDFtk Pro.
Investigating malicious PDF file with pdftk
In order to extract the format text inside the PDF file, we can type the following command -
pdftk sechive.pdf output js.txt
Then, the tool will dump the results to a .txt file and we can view the obfuscated JavaScript code hidden inside the PDF.
Comments