I have a question. My input file consists of two columns. In the first column, I have MGD(and some value) like MGD5, MGD19 and in the second column I have SOL and some value like SOL2, SOL41 and in the second column I have 3x SOL repetitions, so I have in my file 3 lines in which SOL is the same like MGD1 SOL41 and later I have MGD15 SOL41 and later MGD68 SOL41. I want to have two sums. "inner" and "outer", but you calculate in a specific way.
a) The first condition:
If in all three lines I have the same values in $1 and in $2 I add 3 to "inner" and zero to "outer"
like
MGD17 SOL72
MGD17 SOL72
MGD17 SOL72
b) The second conditions two the same values in $1, but one different $1 and of course the same in $2 and I add 1 to "inner" and 2 to "outer" like:
MGD17 SOL115
MGD51 SOL115
MGD51 SOL115
c) The third condition different in $1, the same in $2, so I add to "inner" 0 and to "outer" 3
MGD17 SOL4
MGD51 SOL4
MGD98 SOL4
Input example
MGD24 SOL6215
MGD25 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
MGD28 SOL7
MGD28 SOL6215
Expected output (inner in the first, outer in the second column)
4 5
Why this output?
here 3 inner, 0 outer
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
here 1 inner 2 outer
MGD26 SOL7
MGD26 SOL7
...
MGD28 SOL7
here 0 inner 3 outer
MGD24 SOL6215
MGD25 SOL6215
...
MGD28 SOL6215
I try to write a script. I will do this on one hundred files. I stuck on these conditions I don't know how to implement them in the code. I know that I should process the file twice and in the second time compare my value
#!/bin/bash
for index in {1..100} # I do this script on 100 files, that is s why I use for loop
do
awk 'NR==FNR {a[$1,$2]++; s[$1,$2]++; next}
how to write these conditions????
END {print inner,outer}' eq9_$index.ndx{,} >> inner_outer_water_bridges_x2.txt
done
Do you have any idea?
This is the answer - I adapted my script for working on 100 files
#!/bin/bash
for index in {1..100} # I do this script on 100 files, that is s why I use for loop
do
sort -k2,2 -k1,1 eq9_x3_$index.ndx |
uniq -c |
uniq -f2 -c |
awk '$1>1{outer+=$1} $1<3{inner+=5-2*$1} END{print inner, outer}' >> inner_outer_water_bridges_x3.txt
done
I wrote a full explanation of the @karakfa script below
Input data
MGD24 SOL6215
MGD25 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
MGD28 SOL7
MGD28 SOL6215
- First we sort our values. The primary key is in the 2nd column, secondary key is in the 1st column.
sort -k2,2 -k1,1 file >> output.txt
so we get this when we run the script
MGD24 SOL6215
MGD25 SOL6215
MGD28 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD28 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
- Then we count the same lines, write in the first column the number of lines that repeat, and left only unique lines
sort -k2,2 -k1,1 file |
uniq -c >> output.txt
Our output
1 MGD24 SOL6215
1 MGD25 SOL6215
1 MGD28 SOL6215
2 MGD26 SOL7
1 MGD28 SOL7
3 MGD27 SOL93
- Then we count second column repetitions, write in the first column the number of repetitions and then delete lines with SOL repetitions
sort -k2,2 -k1,1 eq9_x3_1.ndx |
uniq -c |
uniq -f2 -c >> output.txt
our output
3 1 MGD24 SOL6215
2 2 MGD26 SOL7
1 3 MGD27 SOL93
- Then we calculate the value. When we have in the first column value higher than 1, we add to outer value from the first column (so in our date we add 3 from the first row, first column and 2 from the second row first column, so our outer: 3+2 = 5. Then We check again the first column and if the value from the first column is lower than 3 we calculate, so for the second row first column we have 5-22 = 1, and for the third row, first column we have: 5-21 = 3 and our inner: 1 + 3 =4
question from:
https://stackoverflow.com/questions/65862342/counting-depends-on-values-in-the-column-in-awk