I have a file which needs too be cleaned of some URLs. The URLs are in a file say fileA and the CSV fileB(these are huge files of size 6-10 GB). I have tried the following grep command, but it does not work on newer fileB's.
grep -vwF -f patterns.txt fileB.csv > result.csv
The structure of file A is a single list of URLs like so:
URLs (header, single column)
bwin.hu
paradisepoker.li
and fileB:
type|||URL|||Date|||Domain
1|||https://www.google.com|||1524024000|||google.com
2|||www.bwin.hu|||1524024324|||bwin.hu
The delimiter for fileB is |||
I am open to all solutions including awk. Thanks.
Edit: expected output is the CSV file retaining all rows not matching the domain patterns in fileA
type|||URL|||Date|||Domain
1|||https://www.google.com|||1524024000|||google.com
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…