python - How to find all the unique substrings of a very long string?

Question

Welcome To Ask or Share your Answers For Others

python - How to find all the unique substrings of a very long string?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

python - How to find all the unique substrings of a very long string?

I have a very long string. I want to find all the unique substrings of this string. I tried to write the code where I used a set(python) to store all the substrings to ensure uniqueness. I am getting correct result for many medium and large strings however in case of very large strings, I am getting a MemoryError. I googled a bit and found out that the set data structure in python has a large RAM footprint and maybe thats why I am getting a MemoryError.

Here is my code :

a = set()
for i in range(n):
    string = raw_input()
    j = 1
    while True:
        for i in xrange(len(string)-j+1):   
            a.add(string[i:i+j])
        if j==len(string):   break
        j+=1
print sorted(list(a))

Is there a way to avoid this error for large strings? Or can anybody suggest a better modification in my code to handle this issue?

P.S: I donot have an option of shifting between 32 bit and 64 bit versions.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:39:35+0000

If you really need it in memory, then you can try making a suffix tree. Tries are not exotic data structures, so there are probably good implementations available for a mainstream language like Python, and they can be used to implement suffix trees. Marisa-Trie is supposed to get good memory usage.

Create an empty trie.
For each n in [0, len(s)], add the suffix of length n to the Trie.
Every path from the root of the trie is a substring in the string, there are no such paths that are not substrings in the string, and paths are unique.

Categories

python - How to find all the unique substrings of a very long string?

python - How to find all the unique substrings of a very long string?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags