bash - Fast way of finding lines in one file that are not in another?

Question

Welcome To Ask or Share your Answers For Others

bash - Fast way of finding lines in one file that are not in another?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

bash - Fast way of finding lines in one file that are not in another?

I have two large files (sets of filenames). Roughly 30.000 lines in each file. I am trying to find a fast way of finding lines in file1 that are not present in file2.

For example, if this is file1:

line1
line2
line3

And this is file2:

line1
line4
line5

Then my result/output should be:

line2
line3

This works:

grep -v -f file2 file1

But it is very, very slow when used on my large files.

I suspect there is a good way to do this using diff(), but the output should be just the lines, nothing else, and I cannot seem to find a switch for that.

Can anyone help me find a fast way of doing this, using bash and basic Linux binaries?

EDIT: To follow up on my own question, this is the best way I have found so far using diff():

 diff file2 file1 | grep '^>' | sed 's/^> //'

Surely, there must be a better way?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-16T21:23:28+0000

The comm command (short for "common") may be useful comm - compare two sorted files line by line

#find lines only in file1
comm -23 file1 file2 

#find lines only in file2
comm -13 file1 file2 

#find lines common to both files
comm -12 file1 file2

The man file is actually quite readable for this.

Categories

bash - Fast way of finding lines in one file that are not in another?

bash - Fast way of finding lines in one file that are not in another?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags