Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
90 views
in Technique[技术] by (71.8m points)

c# - Why does the order of alternatives matter in regex?

Code

using System;
using System.Text.RegularExpressions;

namespace RegexNoMatch {
    class Program {
        static void Main () {
            string input = "a foobar& b";
            string regex1 = "(foobar|foo)&?";
            string regex2 = "(foo|foobar)&?";
            string replace = "$1";
            Console.WriteLine(Regex.Replace(input, regex1, replace));
            Console.WriteLine(Regex.Replace(input, regex2, replace));
            Console.ReadKey();
        }
    }
}

Expected output

a foobar b
a foobar b

Actual output

a foobar b
a foobar& b

Question

Why does replacing not work when the order of "foo" and "foobar" in regex pattern is changed? How to fix this?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The regular expression engine tries to match the alternatives in the order in which they are specified. So when the pattern is (foo|foobar)&? it matches foo immediately and continues trying to find matches. The next bit of the input string is bar& b which cannot be matched.

In other words, because foo is part of foobar, there is no way (foo|foobar) will ever match foobar, since it will always match foo first.

Occasionally, this can be a very useful trick, actually. The pattern (o|a|(w)) will allow you to capture w and a or o differently:

Regex.Replace("a foobar& b", "(o|a|(\w))", "$2") // fbr& b

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...