Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
377 views
in Technique[技术] by (71.8m points)

javascript - Extracting text from a contentEditable div

I have a div set to contentEditable and styled with "white-space:pre" so it keeps things like linebreaks. In Safari, FF and IE, the div pretty much looks and works the same. All is well. What I want to do is extract the text from this div, but in such a way that will not lose the formatting -- specifically, the line breaks.

We are using jQuery, whose text() function basically does a pre-order DFS and glues together all the content in that branch of the DOM into a single lump. This loses the formatting.

I had a look at the html() function, but it seems that all three browsers do different things with the actual HTML that gets generated behind the scenes in my contentEditable div. Assuming I type this into my div:

1
2
3

These are the results:

Safari 4:

1
<div>2</div>
<div>3</div>

Firefox 3.6:

1
<br _moz_dirty="">
2
<br _moz_dirty="">
3
<br _moz_dirty="">
<br _moz_dirty="" type="_moz">

IE 8:

<P>1</P><P>2</P><P>3</P>

Ugh. Nothing very consistent here. The surprising thing is that MSIE looks the most sane! (Capitalized P tag and all)

The div will have dynamically set styling (font face, colour, size and alignment) which is done using CSS, so I'm not sure if I can use a pre tag (which was alluded to on some pages I found using Google).

Does anyone know of any JavaScript code and/or jQuery plugin or something that will extract text from a contentEditable div in such a way as to preserve linebreaks? I'd prefer not to reinvent a parsing wheel if I don't have to.

Update: I cribbed the getText function from jQuery 1.4.2 and modified it to extract it with whitespace mostly intact (I only chnaged one line where I add a newline);

function extractTextWithWhitespace( elems ) {
    var ret = "", elem;

    for ( var i = 0; elems[i]; i++ ) {
        elem = elems[i];

        // Get the text from text nodes and CDATA nodes
        if ( elem.nodeType === 3 || elem.nodeType === 4 ) {
            ret += elem.nodeValue + "
";

        // Traverse everything else, except comment nodes
        } else if ( elem.nodeType !== 8 ) {
            ret += extractTextWithWhitespace2( elem.childNodes );
        }
    }

    return ret;
}

I call this function and use its output to assign it to an XML node with jQuery, something like:

var extractedText = extractTextWithWhitespace($(this));
var $someXmlNode = $('<someXmlNode/>');
$someXmlNode.text(extractedText);

The resulting XML is eventually sent to a server via an AJAX call.

This works well in Safari and Firefox.

On IE, only the first ' ' seems to get retained somehow. Looking into it more, it looks like jQuery is setting the text like so (line 4004 of jQuery-1.4.2.js):

return this.empty().append( (this[0] && this[0].ownerDocument || document).createTextNode( text ) );

Reading up on createTextNode, it appears that IE's implementation may mash up the whitespace. Is this true or am I doing something wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Unfortunately you do still have to handle this for the pre case individually per-browser (I don't condone browser detection in many cases, use feature detection...but in this case it's necessary), but luckily you can take care of them all pretty concisely, like this:

var ce = $("<pre />").html($("#edit").html());
if($.browser.webkit) 
  ce.find("div").replaceWith(function() { return "
" + this.innerHTML; });    
if($.browser.msie) 
  ce.find("p").replaceWith(function() { return this.innerHTML  +  "<br>"; });
if($.browser.mozilla || $.browser.opera ||$.browser.msie )
  ce.find("br").replaceWith("
");

var textWithWhiteSpaceIntact = ce.text();

You can test it out here. IE in particular is a hassle because of the way is does &nbsp; and new lines in text conversion, that's why it gets the <br> treatment above to make it consistent, so it needs 2 passes to be handled correctly.

In the above #edit is the ID of the contentEditable component, so just change that out, or make this a function, for example:

function getContentEditableText(id) {
    var ce = $("<pre />").html($("#" + id).html());
    if ($.browser.webkit)
      ce.find("div").replaceWith(function() { return "
" + this.innerHTML; });
    if ($.browser.msie)
      ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
    if ($.browser.mozilla || $.browser.opera || $.browser.msie)
      ce.find("br").replaceWith("
");

    return ce.text();
}

You can test that here. Or, since this is built on jQuery methods anyway, make it a plugin, like this:

$.fn.getPreText = function () {
    var ce = $("<pre />").html(this.html());
    if ($.browser.webkit)
      ce.find("div").replaceWith(function() { return "
" + this.innerHTML; });
    if ($.browser.msie)
      ce.find("p").replaceWith(function() { return this.innerHTML + "<br>"; });
    if ($.browser.mozilla || $.browser.opera || $.browser.msie)
      ce.find("br").replaceWith("
");

    return ce.text();
};

Then you can just call it with $("#edit").getPreText(), you can test that version here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...