Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

security - How can I use PHP's various XML libraries to get DOM-like functionality and avoid DoS vulnerabilities, like Billion Laughs or Quadratic Blowup?

I'm writing a web application that has an XML API in PHP, and I'm worried about three specific vulnerabilities, all related to inline DOCTYPE definitions: local file inclusion, quadratic entity blowup, and exponential entity blowup. I'd love to use PHP's (5.3) built in libraries, but I want to make sure I'm not susceptible to these.

I found I can eliminate LFI with libxml_disable_entity_loader, but this doesn't help with inline ENTITY declarations, including entities that refer to other entities.

The SimpleXML library (SimpleXMLElement, simplexml_load_string, etc) is great because it's a DOM parser and all my inputs are fairly small; it allows me to use xpath and manipulate the DOM pretty easily. I can't figure how to stop ENTITY declarations. (I would be happy to disable all inline DOCTYPE definitions, if possible.)

The XML Parser library (xml_parser_create, xml_set_element_handler, etc) allows me to set the default handler, which includes entities, with xml_set_default_handler. I can hack it so for unrecognized entities it simply returns the original string (ie, "&ent;"). This library is frustrating though: because it is a SAX parser I have to write a bunch of handlers (as many as 9..).

So is it possible to use the built in libraries, get DOM-like objects out, and protect myself from these various DoS vulnerabilities? thanks

This page describes the three vulnerabilities, and provides a solution...if only I were using .NET: http://msdn.microsoft.com/en-us/magazine/ee335713.aspx

UPDATE:

<?php
$s = <<<EOF
<?xml version="1.0?>
<!DOCTYPE data [
<!ENTITY en "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa....">
]>
<data>&en;&en;&en;&en;&en;&en;&en;&en;&en;&en;&en;&en;.....</data>
EOF;
$doc = new DOMDocument();
$doc->loadXML($s);
var_dump($d->lastChild->nodeValue);
?>

I tried loadXML($s, LIBXML_NOENT); as well. In both cases I end up dumping 300+ MB. Is there something I'm still missing?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Note: If you create test-cases with files that contain the XML chunks in the following, expect that editors might be prone to these attacks as well and might freeze/crash.

Billion laugh

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

When loading:

FATAL: #89: Detected an entity reference loop 1:7
... (plus six times the same = seven times total with above)
FATAL: #89: Detected an entity reference loop 14:13

Result:

<?xml version="1.0"?>

Memory usage is light, the peak not touched by DOMDocument. As this example shows 7 fatal errors, one can conclude and indeed it is so that this loads w/o errors:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol1 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol2 "&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;&lol1;">
]>
<lolz>&lol2;</lolz>

As entity substitution is not in effect and this work, let's try with

Quadratic Blowup

That is this one here, shortened for your viewing pleasure (my variants are about 27/11kb):

<?xml version="1.0"?>
<!DOCTYPE kaboom [
  <!ENTITY a "aaaaaaaaaaaaaaaaaa...">
]>
<kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;...</kaboom>

If you use $doc->loadXML($src, LIBXML_NOENT); this does work as an attack, while I write this, the script is still loading ... . So this actually takes some time to load and consumes memory. Something you can play with your own. W/o LIBXML_NOENT it works flawlessly and fast.

But there is a caveat, if you obtain the nodeValue of a tag for example, you will get the entities expanded even if you don't use that loading flag.

A workaround for this issue is to remove the DocumentType node from the document. Note the following code:

$doc = new DOMDocument();
$doc->loadXML($s); // where $s is a Quadratic attack xml string above.
// now remove the doctype node
foreach ($doc->childNodes as $child) {
    if ($child->nodeType===XML_DOCUMENT_TYPE_NODE) {
        $doc->removeChild($child);
        break;
    }
}
// Now the following is true:
assert($doc->doctype===NULL);
assert($doc->lastChild->nodeValue==='...');
// Note that entities remain unexpanded in the output XML
// This is not so good since this makes the XML invalid.
// Better is a manual walk through all nodes looking for XML_ENTITY_NODE
assert($doc->saveXML()==="<?xml version="1.0"?>
<kaboom>&a;&a;&a;&a;&a;&a;&a;&a;&a;...</kaboom>
");
// however, canonicalization will produce warnings because it must resolve entities
assert($doc->C14N()===False);
// Warning will be like:
//    PHP Warning:  DOMNode::C14N(): Node XML_ENTITY_REF_NODE is invalid here 

So while this workaround will prevent an XML document from consuming resources in a DoS, it makes it easy to generate invalid XML.

Some figures (I reduced the file-size otherwise it takes too long) (code):

LIBXML_NOENT disabled                                          LIBXML_NOENT enabled

Mem: 356 184 (Peak: 435 464)                                   Mem: 356 280 (Peak: 435 464)                             
Loaded file quadratic-blowup-2.xml into string.                Loaded file quadratic-blowup-2.xml into string.          
Mem: 368 400 (Peak: 435 464)                                   Mem: 368 496 (Peak: 435 464)                             
DOMDocument loaded XML 11 881 bytes in 0.001368 secs.          DOMDocument loaded XML 11 881 bytes in 15.993627 secs.   
Mem: 369 088 (Peak: 435 464)                                   Mem: 369 184 (Peak: 435 464)                             
Removed load string.                                           Removed load string.                                     
Mem: 357 112 (Peak: 435 464)                                   Mem: 357 208 (Peak: 435 464)                             
Got XML (saveXML()), length: 11 880                            Got XML (saveXML()), length: 11 165 132                  
Got Text (nodeValue), length: 11 160 314; 11.060893 secs.      Got Text (nodeValue), length: 11 160 314; 0.025360 secs. 
Mem: 11 517 776 (Peak: 11 532 016)                             Mem: 11 517 872 (Peak: 22 685 360)                       

I have not made up my mind so far about protection strategies but now know that loading the billion laugh into PHPStorm will freeze it for example and I stopped testing the later as I didn't wanted to freeze it while writing this.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...