I'm trying to code a secure and lightweight white-list based HTML purifier which will use DOMDocument. In order to avoid unnecessary complexity I am willing to make the following compromises:
- HTML comments are removed
script
and style
tags are stripped all together
- only the child nodes of the
body
tag will be returned
- all HTML attributes that can trigger Javascript events will either be validated or removed
I've been reading a lot about on XSS attacks and prevention and I hope I'm not being too naive (if I am, please let me know!) in assuming that if I follow all the rules I mentioned above, I will be safe from XSS.
The problem is I am not sure what other tags and attributes (in any [X]HTML version and/or browser versions/implementations) can trigger Javascript events, besides the default Javascript event attributes:
onAbort
onBlur
onChange
onClick
onDblClick
onDragDrop
onError
onFocus
onKeyDown
onKeyPress
onKeyUp
onLoad
onMouseDown
onMouseMove
onMouseOut
onMouseOver
onMouseUp
onMove
onReset
onResize
onSelect
onSubmit
onUnload
Are there any other non-default or proprietary event attributes that can trigger Javascript (or VBScript, etc...) events or code execution? I can think of href
, style
and action
, for instance:
<a href="javascript:alert(document.location);">XSS</a> // or
<b style="width: expression(alert(document.location));">XSS</b> // or
<form action="javascript:alert(document.location);"><input type="submit" /></form>
I will probably just remove any style
attributes in the HTML tags, the action
and href
attributes pose a bigger challenge but I think the following code is enough to make sure their value is either a relative or absolute URL and not some nasty Javascript code:
$value = $attribute->value;
if ((strpos($value, ':') !== false) && (preg_match('~^(?:(?:s?f|ht)tps?|mailto):~i', $value) == 0))
{
$node->removeAttributeNode($attribute);
}
So, my two obvious questions are:
- Am I missing any tags or attributes that can trigger events?
- Is there any attack vector that is not covered by these rules?
After a lot of testing, pondering and researching I've come up with the following (rather simple) implementation which, appears to be immune to any XSS attack vector I could throw at it.
I highly appreciate all your valuable answers, thanks.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…