Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

unix - How does grep run so fast?

I am really amazed by the functionality of GREP in shell, earlier I used to use substring method in java but now I use GREP for it and it executes in a matter of seconds, it is blazingly faster than java code that I used to write.(according to my experience I might be wrong though)

That being said I have not been able to figure out how it is happening? there is also not much available on the web.

Can anyone help me with this?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Assuming your question regards GNU grep specifically. Here's a note from the author, Mike Haertel:

GNU grep is fast because it AVOIDS LOOKING AT EVERY INPUT BYTE.

GNU grep is fast because it EXECUTES VERY FEW INSTRUCTIONS FOR EACH BYTE that it does look at.

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

GNU grep also unrolls the inner loop of Boyer-Moore, and sets up the Boyer-Moore delta table entries in such a way that it doesn't need to do the loop exit test at every unrolled step. The result of this is that, in the limit, GNU grep averages fewer than 3 x86 instructions executed for each input byte it actually looks at (and it skips many bytes entirely).

GNU grep uses raw Unix input system calls and avoids copying data after reading it. Moreover, GNU grep AVOIDS BREAKING THE INPUT INTO LINES. Looking for newlines would slow grep down by a factor of several times, because to find the newlines it would have to look at every byte!

So instead of using line-oriented input, GNU grep reads raw data into a large buffer, searches the buffer using Boyer-Moore, and only when it finds a match does it go and look for the bounding newlines (Certain command line options like -n disable this optimization.)

This answer is a subset of the information taken from here.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...