May 21, 2023

Peepdf | Malicious PDF analysis tool

In recent trend of cyber attack many adversaries keep on targeting the victim using crafted malicious PDF Files, A PDF-based exploit generally refers to a malicious technique or code that targets vulnerabilities within PDF files. by successfully exploiting these vulnerabilities the attackers can execute unauthorized actions, gain control over systems, or deliver malware to unsuspecting users.

Peepdf, a python based PDF analysis CLI tool which typically provide you with all the necessary components that a security researcher could need in a malicious PDF analysis.

Components of Peepdf | SOC Analyst tool

The main functionalities of peepdf are listed below:

Decoding's: hexadecimal, octal, name objects
References in objects and where an object is referenced
Strings search (including streams)
Physical structure (offsets)
Logical tree structure
Metadata
Modifications between versions (changelog)
Compressed objects (object streams)
Analysis and modification of Javascript (PyV8): unescape, replace, join
Shellcode analysis (Libemu python wrapper, pylibemu)
Variables (set command)
Extraction of old versions of the document
Easy extraction of objects, Javascript code, shellcodes (>, >>, >>)
Checking hashes on VirusTotal

Configuration & Installation

Download the peepdf.git from the below mentioned Git Hub repo https://github.com/jesparza/peepdf

Run the peepdf[.]py and follow the prompts to complete the execution

Working

Peepdf has many functions, such as examining the string, scripts, os commands, meta data, etc. it basically investigate and extract all possible data from the PDF file which will be more helpful at investigations.

As result mentioned below it automatically extracted Hash, File size, version, object and streams

Peepdf has an feature for extracting the Meta data information from the PDF samples, generally Metadata refers to the information about a file that provides details regarding its creation, modification, and other relevant attributes. In the context of PDF files, metadata can offer valuable insights into the document's origin, history, and characteristics.

PeePDF also Extract and retrieving object/stream from the mentioned PDF , In the PDF objects and streams are fundamental components used to store and organize data within a PDF file.

Objects: In PDF, objects represent individual pieces of data, such as text, images, fonts, or metadata. Each object has a unique identifier called an object number and a generation number.

Streams: Streams are a way to compress and store large amounts of data within a PDF file efficiently. A stream is essentially a sequence of bytes that can represent various types of data, such as image data, text content, or binary data.

If an attacker tries to embed malicious scripts, URL etc it will add either in Object or String, on investigating individually we can determine the root cause or the workflow of the malware

It been recommended to go through each and every object which has been observed by Peepdf, as per Peepdf we observed one of the object contains a suspicious URL.

Note: Sometime the detected objects might seems to be get encoded to evade detection.

Thank you for taking the time to read this blog post, and I hope that it has been helpful to you. I'd love to hear your thoughts, so please comment below and let me know your thoughts!

Reference:

https://github.com/jesparza/peepdf
https://eternal-todo.com/tools/peepdf-pdf-analysis-tool

Peepdf | Malicious PDF analysis tool

Components of Peepdf | SOC Analyst tool

Configuration & Installation

Working

Comments