algorithm - Anagram of String 2 is Substring of String 1

Question

Welcome To Ask or Share your Answers For Others

algorithm - Anagram of String 2 is Substring of String 1

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:35:30+0000

On edit: my first answer was quadratic in the worst case. I've tweaked it to be strictly linear:

Here is an approach based on the notion of a sliding window: Create a dictionary keyed by the letters of the first dictionary with frequency counts of the letters for the corresponding values. Think of this as a dictionary of targets which need to be matched by m consecutive letters in the second string, where m is the length of the first string.

Start by processing the first m letters in the second string. For each such letter if it appears as a key in the target dictionary decrease the corresponding value by 1. The goal is to drive all target values to 0. Define discrepancy to be the sum of the absolute values of the values after processing the first window of m letters.

Repeatedly do the following: check if discrepancy == 0 and return Trueif it does. Otherwise -- take the character m letters ago and check if it is a target key and if so -- increase the value by 1. In this case, this either increases or decreases the discrepancy by 1, adjust accordingly. Then get the next character of the second string and process it as well. Check if it is a key in the dictionary and if so adjust the value and the discrepancy as appropriate.

Since there are no nested loop and each pass through the main loop involves just a few dictionary lookups, comparisons, addition and subtractions, the overall algorithm is linear.

A Python 3 implementation (which shows the basic logic of how the window slides and the target counts and discrepancy are adjusted):

def subAnagram(s1,s2):
    m = len(s1)
    n = len(s2)
    if m > n: return false
    target = dict.fromkeys(s1,0)
    for c in s1: target[c] += 1

    #process initial window
    for i in range(m):
        c = s2[i]
        if c in target:
            target[c] -= 1
    discrepancy = sum(abs(target[c]) for c in target)

    #repeatedly check then slide:
    for i in range(m,n):
        if discrepancy == 0:
            return True
        else:
            #first process letter from m steps ago from s2
            c = s2[i-m]
            if c in target:
                target[c] += 1
                if target[c] > 0: #just made things worse
                    discrepancy +=1
                else:
                    discrepancy -=1
            #now process new letter:
            c = s2[i]
            if c in target:
                target[c] -= 1
                if target[c] < 0: #just made things worse
                    discrepancy += 1
                else:
                    discrepancy -=1
    #if you get to this stage:
    return discrepancy == 0

Typical output:

>>> subAnagram("rove", "stack overflow")
True
>>> subAnagram("rowe", "stack overflow")
False

To stress-test it, I downloaded the complete text of Moby Dick from Project Gutenberg. This has over 1 million characters. "Formosa" is mentioned in the book, hence an anagram of "moors" appears as a substring of Moby Dick. But, not surprisingly, no anagram of "stackoverflow" appears in Moby Dick:

>>> f = open("moby dick.txt")
>>> md = f.read()
>>> f.close()
>>> len(md)
1235186
>>> subAnagram("moors",md)
True
>>> subAnagram("stackoverflow",md)
False

The last call takes roughly 1 second to process the complete text of Moby Dick and verify that no anagram of "stackoverflow" appears in it.

Categories

algorithm - Anagram of String 2 is Substring of String 1

algorithm - Anagram of String 2 is Substring of String 1

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags