awk
(more specifically, the GNU variant, gawk
) has multi-dimensional arrays that can be indexed using input values (including character strings like in your example). As such, you can count the values in the way you want by doing
{
values[$3] = 1 # this line records the values in column three
counts[$1][$3]++ # and this lines counts their frequency
}
The first line isn't strictly required, but it simplifies generating the output.
The only remaining part is to have an END
clause that outputs the tabulated results.
END {
# Print column headings
printf "Col1 "
for (v in values) {
printf " Count-%s", v
}
printf "
"
# Print tabulated results
for (i in counts) {
printf "%-20s", i
for (v in values) {
printf " %d", counts[i][v]
}
printf "
"
}
}
Generating the values
array handles the case when the values of column three may not be known (e.g., like when there's an error in your input).
If you're using a different awk
implementation (like what you might find in macOS, for example), array indexing may be different (e.g., they are single-dimensional arrays, but indexed by a comma-separate list of indices). This may add some additional complexity, but the idea is the same.
{
files[$1] = 1
values[$3] = 1
counts[$1,$3]++
}
END {
# Print column headings
printf "Col1 "
for (v in values) {
printf " Count-%s", v
}
printf "
"
# Print tabulated results
for (f in files) {
printf "%-20s", f
for (v in values) {
printf " %d", counts[f,v]
}
printf "
"
}
}
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…