Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
168 views
in Technique[技术] by (71.8m points)

c# - Why doesn't finite repetition in lookbehind work in some flavors?

I want to parse the 2 digits in the middle from a date in dd/mm/yy format but also allowing single digits for day and month.

This is what I came up with:

(?<=^[d]{1,2}/)[d]{1,2}

I want a 1 or 2 digit number [d]{1,2} with a 1 or 2 digit number and slash ^[d]{1,2}/ before it.

This doesn't work on many combinations, I have tested 10/10/10, 11/12/13, etc...

But to my surprise (?<=^dd/)[d]{1,2} worked.

But the [d]{1,2} should also match if dd did, or am I wrong?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

On lookbehind support

Major regex flavors have varying supports for lookbehind differently; some imposes certain restrictions, and some doesn't even support it at all.

  • Javascript: not supported
  • Python: fixed length only
  • Java: finite length only
  • .NET: no restriction

References


Python

In Python, where only fixed length lookbehind is supported, your original pattern raises an error because d{1,2} obviously does not have a fixed length. You can "fix" this by alternating on two different fixed-length lookbehinds, e.g. something like this:

(?<=^d/)d{1,2}|(?<=^dd/)d{1,2}

Or perhaps you can put both lookbehinds as alternates of a non-capturing group:

(?:(?<=^d/)|(?<=^dd/))d{1,2}

(note that you can just use d without the brackets).

That said, it's probably much simpler to use a capturing group instead:

^d{1,2}/(d{1,2})

Note that findall returns what group 1 captures if you only have one group. Capturing group is more widely supported than lookbehind, and often leads to a more readable pattern (such as in this case).

This snippet illustrates all of the above points:

p = re.compile(r'(?:(?<=^d/)|(?<=^dd/))d{1,2}')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'^d{1,2}/(d{1,2})')

print(p.findall("12/34/56"))   # "[34]"
print(p.findall("1/23/45"))    # "[23]"

p = re.compile(r'(?<=^d{1,2}/)d{1,2}')
# raise error("look-behind requires fixed-width pattern")

References


Java

Java supports only finite-length lookbehind, so you can use d{1,2} like in the original pattern. This is demonstrated by the following snippet:

    String text =
        "12/34/56 date
" +
        "1/23/45 another date
";

    Pattern p = Pattern.compile("(?m)(?<=^\d{1,2}/)\d{1,2}");
    Matcher m = p.matcher(text);
    while (m.find()) {
        System.out.println(m.group());
    } // "34", "23"

Note that (?m) is the embedded Pattern.MULTILINE so that ^ matches the start of every line. Note also that since is an escape character for string literals, you must write "" to get one backslash in Java.


C-Sharp

C# supports full regex on lookbehind. The following snippet shows how you can use + repetition on a lookbehind:

var text = @"
1/23/45
12/34/56
123/45/67
1234/56/78
";

Regex r = new Regex(@"(?m)(?<=^d+/)d{1,2}");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine(m);
} // "23", "34", "45", "56"

Note that unlike Java, in C# you can use @-quoted string so that you don't have to escape .

For completeness, here's how you'd use the capturing group option in C#:

Regex r = new Regex(@"(?m)^d+/(d{1,2})");
foreach (Match m in r.Matches(text)) {
  Console.WriteLine("Matched [" + m + "]; month = " + m.Groups[1]);
}

Given the previous text, this prints:

Matched [1/23]; month = 23
Matched [12/34]; month = 34
Matched [123/45]; month = 45
Matched [1234/56]; month = 56

Related questions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...