Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

c# - using static Regex.IsMatch vs creating an instance of Regex

In C# should you have code like:

public static string importantRegex = "magic!";

public void F1(){
  //code
  if(Regex.IsMatch(importantRegex)){
    //codez in here.
  }
  //more code
}
public void main(){
  F1();
/*
  some stuff happens......
*/
  F1();
}

or should you persist an instance of a Regex containing the important pattern? What is the cost of using Regex.IsMatch? I imagine there is an NFA created in each Regex intance. From what I understand this NFA creation is non trivial.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

In a rare departure from my typical egotism, I'm kind of reversing myself on this answer.

My original answer, preserved below, was based on an examination of version 1.1 of the .NET framework. This is pretty shameful, since .NET 2.0 had been out for over three years at the time of my answer, and it contained changes to the Regex class that significantly affect the difference between the static and instance methods.

In .NET 2.0 (and 4.0), the static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern, RegexOptions.None, true).IsMatch(input);
}

The significant difference here is that little true as the third argument. That corresponds to a parameter named "useCache". When that is true, then the parsed tree is retrieved from cached on the second and subsequent use.

This caching eats up most—but not all—of the performance difference between the static and instance methods. In my tests, the static IsMatch method was still about 20% slower than the instance method, but that only amounted to about a half second increase when run 100 times over a set of 10,000 input strings (for a total of 1 million operations).

This 20% slowdown can still be significant in some scenarios. If you find yourself regexing hundreds of millions of strings, you'll probably want to take every step you can to make it more efficient. But I'd bet that 99% of the time, you're using a particular Regex no more than a handful of times, and the extra millisecond you lose to the static method won't be even close to noticeable.

Props to devgeezer, who pointed this out almost a year ago, although no one seemed to notice.

My old answer follows:


The static IsMatch function is defined as follows:

public static bool IsMatch(string input, string pattern){
    return new Regex(pattern).IsMatch(input);
}

And, yes, initialization of a Regex object is not trivial. You should use the static IsMatch (or any of the other static Regex functions) as a quick shortcut only for patterns that you will use only once. If you will reuse the pattern, it's worth it to reuse a Regex object, too.

As to whether or not you should specify RegexOptions.Compiled, as suggested by Jon Skeet, that's another story. The answer there is: it depends. For simple patterns or for patterns used only a handful of times, it may well be faster to use a non-compiled instance. You should definitely profile before deciding. The cost of compiling a regular expression object is quite large indeed, and may not be worth it.


Take, as an example, the following:

const int count = 10000;

string pattern = "^[a-z]+[0-9]+$";
string input   = "abc123";

Stopwatch sw = Stopwatch.StartNew();
for(int i = 0; i < count; i++)
    Regex.IsMatch(input, pattern);
Console.WriteLine("static took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
Regex rx = new Regex(pattern);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("instance took {0} seconds.", sw.Elapsed.TotalSeconds);

sw.Reset();
sw.Start();
rx = new Regex(pattern, RegexOptions.Compiled);
for(int i = 0; i < count; i++)
    rx.IsMatch(input);
Console.WriteLine("compiled took {0} seconds.", sw.Elapsed.TotalSeconds);

At count = 10000, as listed, the second output is fastest. Increase count to 100000, and the compiled version wins.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...