We have a special requirement in a project where we have to parse a string of HTML (from an AJAX response) client side via JavaScript only. Thats right no parsing in PHP or Java! I've been going through StackOverflow, this entire week and have yet not got an acceptable solution.
Some more details on the requirements:
We can use any library (preferably dojo and / or jQuery) or go native!
We need to parse an Entire HTML Document that we receive as a string, including the <head>
and <body>
.
We also need to serialise out the parsed DOM structures to strings at times.
Finally, We don't want to append the parsed DOM to the current Document. Rather, we'll send it back to the server for permanent storage.
Eg: We need something like
var dom = HTMLtoDOM('<html><head><title> This is the old title. </title></head></html>');
dom.getElementsByTagName('title')[0].innerHTML = "This is a new Title";
With my research, these are our options:
A TinyMCE Parser. Problem? We need to necessarily include an editor I think. How about for parsing HTML where we don't need an editor?
John Resig's Parser. Should be our best bet. Unfortunately, the parser is crashing when the entire contents of a page is given to it!
The jQuery $(htmlString) or the dojo.toDom(htmlString). Both rely on DocumentFragment and hence gobble up <head>
and <body>
!
EDIT: We want to serialize the HTML so we may catch certain custom HTML Commnets via RegExp. We need to give users the opportunity to edit meta tags, title tags etc hence the HTML Parser.
Oh and I feel I will be murdered in Stack Overflow even if I just hint at parsing HTML via RegExp!!!
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…