Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
489 views
in Technique[技术] by (71.8m points)

python - Is the time-complexity of iterative string append actually O(n^2), or O(n)?

I am working on a problem out of CTCI.

The third problem of chapter 1 has you take a string such as

'Mr John Smith '

and asks you to replace the intermediary spaces with %20:

'Mr%20John%20Smith'

The author offers this solution in Python, calling it O(n):

def urlify(string, length):
    '''function replaces single spaces with %20 and removes trailing spaces'''
    counter = 0
    output = ''
    for char in string:
        counter += 1
        if counter > length:
            return output
        elif char == ' ':
            output = output + '%20'
        elif char != ' ':
            output = output + char
    return output

My question:

I understand that this is O(n) in terms of scanning through the actual string from left to right. But aren't strings in Python immutable? If I have a string and I add another string to it with the + operator, doesn't it allocate the necessary space, copy over the original, and then copy over the appending string?

If I have a collection of n strings each of length 1, then that takes:

1 + 2 + 3 + 4 + 5 + ... + n = n(n+1)/2

or O(n^2) time, yes? Or am I mistaken in how Python handles appending?

Alternatively, if you'd be willing to teach me how to fish: How would I go about finding this out for myself? I've been unsuccessful in my attempts to Google an official source. I found https://wiki.python.org/moin/TimeComplexity but this doesn't have anything on strings.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In CPython, the standard implementation of Python, there's an implementation detail that makes this usually O(n), implemented in the code the bytecode evaluation loop calls for + or += with two string operands. If Python detects that the left argument has no other references, it calls realloc to attempt to avoid a copy by resizing the string in place. This is not something you should ever rely on, because it's an implementation detail and because if realloc ends up needing to move the string frequently, performance degrades to O(n^2) anyway.

Without the weird implementation detail, the algorithm is O(n^2) due to the quadratic amount of copying involved. Code like this would only make sense in a language with mutable strings, like C++, and even in C++ you'd want to use +=.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...