Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
331 views
in Technique[技术] by (71.8m points)

c# - Apply a Regex on Stream?

I'm searching for fast and safe way to apply Regular Expressions on Streams.

I found some examples over the internet that talking about converting each buffer to String and then apply the Regex on the string.

This approach have two problems:

  • Performance: converting to strings and GC'ing the strings is waste of time and CPU and sure can be avoided if there was a more native way to apply Regex on Streams.
  • Pure Regex support: Regex pattern sometimes can match only if combining two buffers together (buffer 1 ends with the first part of the match, and buffer 2 starts with the second part of the match). The convert-to-string way cannot handle this type of matching natively, I have to provide more information like the maximum length that the pattern can match, this does not support the + and * regex signs at all and will never support (unlimited match length).

So, the convert-to-string way is not fast, and doesn't fully support Regex.

Is there any way / Library that can be used to apply Regex on Streams without converting to strings and with full Regex support?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Intel has recently open sourced hyperscan library under BSD license. It's a high-performance non-backtracking NFA-based regex engine.

Features: ability to work on streams of input data and simultaneous multiple patterns matching. The last one differs from (pattern1|pattern2|...) approach, it actually matches patterns concurrently.

It also utilizes Intel's SIMD instructions sets like SSE4.2, AVX2 and BMI. The summary of the design and explanation of work can be found here. It also has great developer's reference guide with a lot of explanations as well as performance and usage considerations. Small article about using it in the wild (in russian).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...