Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
540 views
in Technique[技术] by (71.8m points)

jquery - JavaScript : Find (and Replace) text that is NOT in a specific HTML element?

TL;DR; Summary

How do I inject a <span> around a specific word or phrase found in the HTML of the current page BUT ignore any text which is ALREADY contained within the same span I am trying to inject.

Due to the large number of values being processed, this must be high performance!

Example:

Searching for "foo"

Should find a match:

<p>This sentence contains a foo bar value</p>

Should NOT find a match:

<p>This sentence contains a <span class='widget'>foo bar</span> value</p>

Background - i.e. Why?

I am looking into a specific problem of having to inject a <span class='widget'> element around specific text found on a page dynamically. The text I am looking for is in a large array.

  • Array of text strings to look for is in the thousands
  • Text values can contain phrases or words
  • phrases must take precedence over words

This last one is a killer. For example:

  • I have two values "foo bar" and "foo"
  • I want to process the sentence: "This is a foo bar sentence"

After I have finished processing then ..

Desired Output

"This is a <span class='widget'>foo bar</span> sentence"

NOT Desired

"This is a <span class='widget'>foo <span class='widget'>bar</span></span> sentence"

Now .. the first step in achieving this is to sort my array by length (process the longest ones first). But the problem is that after processing my find-replace logic is still finding the smaller "word" inside the (already processed) phrase.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If, and only if, there are no nested <span>-tags, you can search for

/(<span[^>]*>[sS]*?</span>)|((?:foo|bar)(?:s+(?:foo|bar))*)/g and replace it with the function

function matchEvaluator(_, span, word) {
    if (span) return span;
    return '<span class="widget">' + word + '</span>';
}
  • the part (<span[^>]*>[sS]*?</span>) searches for the span element.
    That's the part, where no nested <span>-element is allowed. The matched text is returnd unchanged (the reason to match them is to consume all the characters inside <span>)
  • <span[^>]*> searches for the start tag - this could be not sufficient for your needs. Maybe you'll try to be more specific, e.g. something like <span(?:s+w[w-]*(?:=(?:"[^"]*"|'[^']*'|S*)))*>
  • ((?:foo|bar)(?:s+(?:foo|bar))*) searches for the words "foo" and "bar"
    If there is one, it searches for space characters and another "foo" or "bar" (repeatedly). Since the <span>-tags and all their content is already consumed, you can only match "foo" and "bar" outside <span>
  • the matchEvaluator-function tests, if a span element is matched and if so, simply returns the matched text. Otherwise, the words are matched and they are returned wrapped into the new span..

Test:

var texts = [
    "This is a foo bar sentence",
    "This sentence contains a <span class='widget'>foo bar</span> value"
];

var wordsOutsideSpan_rx = /(<span[^>]*>[sS]*?</span>)|((?:foo|bar)(?:s+(?:foo|bar))*)/g;
function wrapInSpan(_, span, word) {
    if (span) return span;
    return '<span class="widget">' + word + '</span>';
}

texts.forEach(function (txt) {
     console.log(txt.replace(wordsOutsideSpan_rx, wrapInSpan));
});

// outputs
// "This is a <span class="widget">foo bar</span> sentence"
// "This sentence contains a <span class='widget'>foo bar</span> value"

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...