Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
412 views
in Technique[技术] by (71.8m points)

linux - grep a large list against a large file

I am currently trying to grep a large list of ids (~5000) against an even larger csv file (3.000.000 lines).

I want all the csv lines, that contain an id from the id file.

My naive approach was:

cat the_ids.txt | while read line
do
  cat huge.csv | grep $line >> output_file
done

But this takes forever!

Are there more efficient approaches to this problem?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try

grep -f the_ids.txt huge.csv

Additionally, since your patterns seem to be fixed strings, supplying the -F option might speed up grep.

   -F, --fixed-strings
          Interpret PATTERN as a  list  of  fixed  strings,  separated  by
          newlines,  any  of  which is to be matched.  (-F is specified by
          POSIX.)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...