Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
294 views
in Technique[技术] by (71.8m points)

c# - Checking collision in filename search patterns with wildcards

I need to compare file system wildcard expressions to see whether their results would overlap, by only examining/comparing the expressions.

For sake of example, we are building a utility that would sort files from one (or more locations) into a separate folders based on file system wildcard expressions. For example: *.txt goes into folder a, *.doc goes into folder b, and so on. The wildcard characters we would support would be * and ?

I want to be able to determine from just analyzing the wildcard expressions, whether they would conflict/overlap.

For example, if I have the following expressions:

*.x.y
*.y

They would conflict (overlap) because the second expression *.y would include *.x.y results. (ex. A.x.y would match both expressions)

I am approaching this by building a tree structure using the using all of the expressions, figuring that the very act of building a tree will fail if the expressions conflict.

For example:
*.x
a.b
a.c
b.d

might create a tree like 

         +-*-.-x
         |
start +--+
         |     +-b
         |     |
         +-a-.-+-c
         |
         |
         +-b-.-d

If I try to add the pattern b.x, the tree would be successful following the *.x path, and thereby say that the pattern already exists.

Am I heading in the correct direction? Or is there an known algorithm for attacking this?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

To check whether two wildcard patterns could match the same filename, you can look at this problem as creating a grid of comparisons between pairs of characters, and then checking whether there exists a diagonal path. The illustration below shows how wildcard patterns ab?.c?? and a*bc.* can be checked for possible conflict:

wildcard conflict animation

When a match between two identical literal characters is found, you move diagonally to the next characters to check. (indicated with green arrow)

When a literal character and a single-character wild card ? are encountered, there are two possibilities: either the wild card matches the character (move diagonally), or the wildcard matches empty space, and you skip over it. (indicated with purple arrows)

When a multi-character wild card * is encountered, three possibilities need to be taken into account: the wild card matches an empty space before the other character, the wild card matches the other character, or the wild card matches multiple characters. (indicated with blue arrows)

wildcard conflict comparisons

code example 1 (iterative)

Below is a simple javascript implementation which iterates over the grid diagonally, marks cells which can be reached from the current cell, and then checks whether the bottom right cell is reachable. Run the code snippet to see a few examples. (update: top-to-bottom left-to-right will do fine instead of diagonally)

function wildConflict(wild1, wild2) {
    var grid = [[true]], width = wild1.length, height = wild2.length;
    for (var x = 1; x <= width; x++) grid[x] = [];
    for (var y = 0; y < height; y++) {
        for (var x = 0; x < width; x++) {
            if (grid[x][y]) {
                var a = wild1.charAt(x), b = wild2.charAt(y);
                if (a == '*' || b == '*' || a == '?' || b == '?' || a == b) grid[x + 1][y + 1] = true;
                if (a == '*' || b == '*' || a == '?') grid[x + 1][y] = true;
                if (a == '*' || b == '*' || b == '?') grid[x][y + 1] = true;
            }
        }
    }
    return grid[width][height] == true;
}

var a = ["a", "a", "a*", "abc", "a*", "*.x.y", "*.x.y", "a*b*", "a*bc.*", "a?c.de"];
var b = ["a", "b", "abc", "a?", "*b", "*.y", "*.x", "a*c*", "ab?.c??", "ac.d??"];
for (var i in a) document.write("&quot;" + a[i] + "&quot; &harr; &quot;" + b[i] + "&quot; &rarr; " + wildConflict(a[i], b[i]) + "<BR>");

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...