In my web app, users can input text data. This data can be shown to other users, and the original author can also go back and edit their data. I'm looking for the correct way to safely escape this data.
I'm only sql sanitizing on the way in, so everything is stored as it reads. Let's say I have "déjà vu" in the database. Or, to be more extreme, a <script>
tag. It is possible that this may be valid, and not even maliciously intended, input.
I'm using htmlentities()
on the way out to make sure everything is escaped. The problem is that html and input fields treat things differently. I want to make sure it's safe in HTML, but that the author when editing the text, sees exactly what they typed in the input fields. I'm also using jQuery to fill form fields with the data dynamically.
If I do this:
<p><?=htmlentities("déjà vu");?></p>
<input type=text value="<?=htmlentities("déjà vu");?>">
The page source puts déjà vu
in both places (I had to backtick that or you would see "déjà vu"!) The problem is that the output in the <p>
is correct, but the input just shows the escaped text. If the user resubmits their form, they double escape and ruin their input.
I know I still have to sanitize text that goes into the field, otherwise you can end the value quote and do bad things. The only solution I found is this. Again, I'm using jQuery.
var temp = $("<div></div>").html("<?=htmlentities("déjà vu");?>");
$("input").val(temp.html());
This works, as it causes the div to read the escaped text as encoded characters, and then the jquery copies those encoded characters to the input tag, properly preserved.
So my question: is this still safe, or is there a security hole somewhere? And more importantly, is this the only / correct way to do this? Am I missing something about how html and character encoding works that make this a trivial issue to solve?
EDIT
This is actually wrong, I oversimplified my example to the point of it not working. The problem is actually because I'm using jQuery's val() to insert the text into the field.
<input>
<script>$("input").val("<?=htmlentities("déjà vu");?>");</script>
The reason for this is that the form is dynamic - the user can add or remove fields at will and so they are generated after page load.
So it seems that jQuery is escaping the data to go into the input, but it's not quite good enough - if I don't do anything myself, a user can still put in a </script>
tag, killing my code and inserting malicious code. But there's another argument to be made here. Since only the original author can see the text in an input box anyway, should I even bother? Basically the only people they could execute an XSS attack against is themselves.
See Question&Answers more detail:
os