Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
129 views
in Technique[技术] by (71.8m points)

javascript - Replacing a lot of text in browser's addon

I'm trying to develop a Firefox add-on that transliterates the text on any page into specific language. Actually it's just a set of 2D arrays which I iterate and use this code

function escapeRegExp(str) {
    return str.replace(/([.*+?^=!:${}()|[]/\])/g, "\$1");
}

function replaceAll(find, replace) {
    return document.body.innerHTML.replace(new RegExp(escapeRegExp(find), 'g'), replace);
}

function convert2latin() {
    for (var i = 0; i < Table.length; i++) {
        document.body.innerHTML = replaceAll(Table[i][1], Table[i][0]);
    }
}

It works, and I can ignore HTML tags, as it can be in english only, but the problem is performance. Of course it's very very poor. As I have no experience in JS, I tried to google and found that maybe documentFragment can help.
Maybe I should use another approach at all?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Based on your comments, you appear to have already been told that the most expensive thing is the DOM rebuild that happens when you completely replace the entire contents of the page (i.e. when you assign to document.body.innerHTML). You are currently doing that for each substitution. This results in Firefox re-rendering the entire page for each substitution you are making. You only need assign to document.body.innerHTML once, after you have made all of the substitutions.

The following should provide a first pass at making it faster:

function escapeRegExp(str) {
    return str.replace(/([.*+?^=!:${}()|[]/\])/g, "\$1");
}

function convert2latin() {
    newInnerHTML = document.body.innerHTML
    for (let i = 0; i < Table.length; i++) {
        newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table[i][1]), 'g'), Table[i][0]);
    }
    document.body.innerHTML = newInnerHTML
}

You mention in comments that there is no real need to use a RegExp for the match, so the following would be even faster:

function convert2latin() {
    newInnerHTML = document.body.innerHTML
    for (let i = 0; i < Table.length; i++) {
        newInnerHTML = newInnerHTML.replace(Table[i][1], Table[i][0]);
    }
    document.body.innerHTML = newInnerHTML
}

If you really need to use a RegExp for the match, and you are going to perform these exact substitutions multiple times, you are better off creating all of the RegExp prior to the first use (e.g. when Table is created/changed) and storing them (e.g. in Table[i][2]).

However, assigning to document.body.innerHTML is a bad way to do this:

As the8472 mentioned, replacing the entire content of document.body.innerHTML is a very heavy handed way to perform this task, which has some significant disadvantages including probably breaking the functionality of other JavaScript in the page and potential security issues. A better solution would be to change only the textContent of the text nodes.

One method of doing this is to use a TreeWalker. The code to do so, could be something like:

function convert2latin(text) {
    for (let i = 0; i < Table.length; i++) {
        text = text.replace(Table[i][1], Table[i][0]);
    }
    return text
}

//Create the TreeWalker
let treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,{
    acceptNode: function(node) { 
        if(node.textContent.length === 0
            || node.parentNode.nodeName === 'SCRIPT' 
            || node.parentNode.nodeName === 'STYLE'
        ) {
            //Don't include 0 length, <script>, or <style> text nodes.
            return NodeFilter.FILTER_SKIP;
        } //else
        return NodeFilter.FILTER_ACCEPT;
    }
}, false );
//Make a list of nodes prior to modifying the DOM. Once the DOM is modified the TreeWalker
//  can become invalid (i.e. stop after the first modification). Doing so is not needed
//  in this case, but is a good habit for when it is needed.
let nodeList=[];
while(treeWalker.nextNode()) {
    nodeList.push(treeWalker.currentNode);
}
//Iterate over all text nodes, changing the textContent of the text nodes 
nodeList.forEach(function(el){

    el.textContent = convert2latin(el.textContent));
});

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...