Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
435 views
in Technique[技术] by (71.8m points)

python - Find gaps in a sequence of Strings

I have got a sequence of strings - 0000001, 0000002, 0000003.... upto 2 million. They are not contiguous. Meaning there are gaps. Say after 0000003 the next string might be 0000006. I need to find out all these gaps. In the above case (0000004, 0000005).

This is what I have done so far -

gaps  = list()
total = len(curr_ids)

for i in range(total):
    tmp_id = '%s' %(str(i).zfill(7))
    if tmp_id in curr_ids:
        continue
    else:
        gaps.append(tmp_id)
return gaps

But as you would have guessed, this is slow since I am using list. If I use a dict, to pre-populate curr_ids it'll be faster. But what's the complexity to populating a hash-table? What's the fastest way to do this.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could sort the list of ids and then step through it once only:

def find_gaps(ids):
    """Generate the gaps in the list of ids."""
    j = 1
    for id_i in sorted(ids):
        while True:
            id_j = '%07d' % j
            j += 1
            if id_j >= id_i:
                break
            yield id_j

>>> list(find_gaps(["0000001", "0000003", "0000006"]))
['0000002', '0000004', '0000005']

If the input list is already in order, then you can avoid the sorted (though it does little harm: Python's adaptive mergesort is O(n) if the list is already sorted).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...