I have a div set to contentEditable
and styled with "white-space:pre
" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.
We are using jQuery, whose text()
function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.
I had a look at the html()
function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable
div. Assuming I type this into my div:
1
2
3
These are the results:
Safari 4:
1
<div>2</div>
<div>3</div>
Firefox 3.6:
1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">
IE 8:
<P>1</P><P>2</P><P>3</P>
Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)
The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre
tag (which was alluded to on some pages I found using Google).
Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.
Update: I cribbed the getText
function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);
function extractTextWithWhitespace( elems ) {
var ret = "", elem;
for ( var i = 0; elems[i]; i++ ) {
elem = elems[i];
// Get the text from text nodes and CDATA nodes
if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
ret += elem.nodeValue + "
";
// Traverse everything else, except comment nodes
} else if ( elem.nodeType !== 8 ) {
ret += extractTextWithWhitespace2( elem.childNodes );
}
}
return ret;
}
I call this function and use its output to assign it to an XML node with jQuery, something like:
var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);
The resulting XML is eventually sent to a server via an AJAX call.
This works well in Safari and Firefox.
On IE, only the first '
' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):
return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );
Reading up on createTextNode
, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?
See Question&Answers more detail:
os