Suppose you have a file that contains IP addresses, one address in each line:
10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1
You need a shell script that counts for each IP address how many times it appears in the file. For the previous input you need the following output:
10.0.10.1 3
10.0.10.2 1
10.0.10.3 1
One way to do this is:
cat ip_addresses |uniq |while read ip
do
echo -n $ip" "
grep -c $ip ip_addresses
done
However it is really far from being efficient.
How would you solve this problem more efficiently using bash?
(One thing to add: I know it can be solved from perl or awk, I'm interested in a better solution in bash, not in those languages.)
ADDITIONAL INFO:
Suppose that the source file is 5GB and the machine running the algorithm has 4GB. So sort is not an efficient solution, neither is reading the file more than once.
I liked the hashtable-like solution - anybody can provide improvements to that solution?
ADDITIONAL INFO #2:
Some people asked why would I bother doing it in bash when it is way easier in e.g. perl. The reason is that on the machine I had to do this perl wasn't available for me. It was a custom built linux machine without most of the tools I'm used to. And I think it was an interesting problem.
So please, don't blame the question, just ignore it if you don't like it. :-)
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…