Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
345 views
in Technique[技术] by (71.8m points)

c# - Regex match take a very long time to execute

I wrote a regular expression that parses a file path into different group (DRIVE, DIR, FILE, EXTENSION).

^((?<DRIVE>[a-zA-Z]):\)*((?<DIR>[a-zA-Z0-9_]+(([a-zA-Z0-9_s_-.]*[a-zA-Z0-9_]+)|([a-zA-Z0-9_]+)))\)*(?<FILE>([a-zA-Z0-9_]+(([a-zA-Z0-9_s_-.]*[a-zA-Z0-9_]+)|([a-zA-Z0-9_]+)).(?<EXTENSION>[a-zA-Z0-9]{1,6})$))

I made a test in C#. When the path I want to test is correct. The result is very quick and this is what I wanted to expect.

string path = @"C:Documents and SettingsjhrMy DocumentsVisual Studio 2010ProjectsFileEncryptorDds.FileEncryptorDds.FileEncryptor.csproj";

=> OK

But when I try to test with a path that I know that will not match, like this :

string path = @"C:Documents and SettingsjhrMy DocumentsVisual Studio 2010ProjectsFileEncryptorDds.FileEncryptorDds.FileEncryptor?!??????";

=> BUG

The test freezes when I call this part of code

Match match = s_fileRegex.Match(path);

When i look into my Process Explorer, I see the process QTAgent32.exe hanging at 100% of my processor. What does it mean ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The problem you are experiencing is called catastrophic backtracking and is due to the large number of ways that you regular expression can match the start of the string, which gives slow performance due to the backtracking regular expression engine in .NET.

I think you are using * too frequently in your regular expression. * does not mean "concatenate" - it means "0 or more times". For example there should not be a * here:

((?<DRIVE>[a-zA-Z]):\)*

There should be at most one drive specification. You should use ? instead here, or else no quantifier at all if you want the drive specification to be compulsory. Similarly there appear to be other places in your regular expression where the quantifier is incorrect.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...