performance - Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

Question

Welcome To Ask or Share your Answers For Others

performance - Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

performance - Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

For instance, say I wanted a function to escape a string for use in HTML (as in Django's escape filter):

    def escape(string):
        """
        Returns the given string with ampersands, quotes and angle 
        brackets encoded.
        """
        return string.replace('&', '&amp;').replace('<', '&lt;').replace('>', '&gt;').replace("'", '&#39;').replace('"', '&quot;')

This works, but it gets ugly quickly and appears to have poor algorithmic performance (in this example, the string is repeatedly traversed 5 times). What would be better is something like this:

    def escape(string):
        """
        Returns the given string with ampersands, quotes and angle 
        brackets encoded.
        """
        # Note that ampersands must be escaped first; the rest can be escaped in 
        # any order.
        return replace_multi(string.replace('&', '&amp;'),
                             {'<': '&lt;', '>': '&gt;', 
                              "'": '&#39;', '"': '&quot;'})

Does such a function exist, or is the standard Python idiom to use what I wrote before?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:17:21+0000

Do you have an application that is running too slow and you profiled it to find that a line like this snippet is causing it to be slow? Bottlenecks occur at unexpected places.

The current snippet traverses the string 5 times, doing one thing each time. You are suggesting traversing it once, probably doing doing five things each time (or at least doing something each time). It isn't clear that this will automatically do a better job to me. Currently the algorithm used is O(n*m) (assuming the length of the string is longer than the stuff in the rules), where n is the length of the string and m is the number of substitution rules. You could, I think, reduce the algorithmic complexity to something like O(n*log(m)) and in the specific case we're in—where the original things are all only one character (but not in the case of multiple calls to replace in general)—O(n), but this doesn't matter since m is 5 but n is unbounded.

If m is held constant, then, the complexity of both solutions really goes to O(n). It is not clear to me that it is going to be a worthy task to try to turn five simple passes into one complex one, the actual time of which I cannot guess at the current moment. If there was something about it that could make it scale better, I would have thought it was much more worthwhile task.

Doing everything on one pass rather than consecutive passes also demands questions be answered about what to do about conflicting rules and how they are applied. The resolution to these questions is clear with a chain of replace.

Categories

performance - Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

performance - Is str.replace(..).replace(..) ad nauseam a standard idiom in Python?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags