This article will talk about a new server-side vulnerability that I discovered in the PDF export process. Many servers are still vulnerable, varying from social networks to financial and governmental websites. Did you know? you can now download gbwhatsapp from your phone. Check out softgoza for more information.
Have you ever surfed the internet and seen a “Download as PDF” button? Over the past few years, many sites have added the option to export your personal data to an accessible format, as PDF / Word.
As a penetration tester, I have tested a lot of large web applications that included the conversion feature, and was wondering – what happens behind the scenes, does this process broaden the attack surface?
After a quick research, I discovered that the process is very dangerous from a security perspective, and without the appropriate filtering, could expose your application to many vulnerabilities.
In this article, I will try to explain the conversion process and the potential attacks.
1. The Conversion Process
When a website converts data to PDF, in most cases, what actually happens is the following process
- The web application gets the client’s data from a database / directly from the client.
- Put the data inside an HTML template*
- Sends the custom HTML to an external library
- The external library gets the HTML, does its magic, and returns a PDF file
- The client downloads the PDF file.
The most interesting part is the conversion from the custom HTML to the PDF file by the external library.
I discovered that there are many players in the HTML to PDF market.
2. The attack vector
The common external libraries are full of features and support many HTML tags. Some of them even support CSS and Javascript.
With this understanding, think about the following scenario: what would happen, if an attacker succeeds to inject a malicious HTML tag to the conversion process?
If the web application does not encode or filter the user’s input, the server is exposed to a wide range of vulnerabilities.
2.1. Arbitrary file download
If we could inject an HTML tag to the conversion process, in some libraries, we can download almost any file from the webserver. For this attack vector, we should use these tags:
- iframe / frame
- object
- fonts (CSS)
Example from the real world:
1. The HTTP Request
2.2 Internal network exposure (SSRF)
The “Export Injection”, in all the libraries, gives us the option to obtain a lot of information about the server. Some techniques that have occurred to me:
- Internal port scanning: by the delay of the response from the webserver, we can reveal if a port is open or closed. For example, if we send a malicious IMG tag:
- <img src=”http://127.0.0.1:445”/> – Delay of 2.3 seconds (The port is open)
- <img src=”http://127.0.0.1:666”/> – Delay of 4.8 seconds (The port is close)
- Internal resources access: we can use the Object, Iframe and Frame tags to access internal HTTP interfaces and watch the responses. For example:
- Injection of:
- <object data=”http://127.0.0.1:8443”/>
- Discover the real IP address of the website: We could make the site perform an HTTP request to any server on the internet, even to our server. I used the “iplogger” site to log the IP address of the attacked website:
- <img src=”https://iplogger.com/113A.gif”/>
With this technique, we can expose the real IP of the web server, and perform an effective port scan.
2.3. Effective Denial of Service (DOS)
- <img src=”http://download.thinkbroadband.com/1GB.zip”/>
Causes the web application to download a heavy file. - <iframe src=”http://example.com/RedirectionLoop.aspx”/>
Causes the web application to enter to a long HTTP redirection loop.
The way to perform a DOS attack changes from library to library.
3. How to protect yourself?
As a concept, you should never pass users’ input to an external library without a thought. Always think – “What an attacker would do?”
In this specific case, you should encode the input before passing it to the external conversion libraries.
HTML Encode should work and prevent the potential vulnerabilities in most cases.
Vulnerable Libraries:
To understand which external library has been used, just open the PDF file with Hex Editor, and search for strings like ‘Creator’, or ‘Author’