Counting depends on values in the column in awk

Question

Welcome To Ask or Share your Answers For Others

Counting depends on values in the column in awk

posted Mar 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

Counting depends on values in the column in awk

I have a question. My input file consists of two columns. In the first column, I have MGD(and some value) like MGD5, MGD19 and in the second column I have SOL and some value like SOL2, SOL41 and in the second column I have 3x SOL repetitions, so I have in my file 3 lines in which SOL is the same like MGD1 SOL41 and later I have MGD15 SOL41 and later MGD68 SOL41. I want to have two sums. "inner" and "outer", but you calculate in a specific way.

a) The first condition: If in all three lines I have the same values in $1 and in $2 I add 3 to "inner" and zero to "outer" like

MGD17  SOL72
MGD17  SOL72
MGD17  SOL72

b) The second conditions two the same values in $1, but one different $1 and of course the same in $2 and I add 1 to "inner" and 2 to "outer" like:

MGD17  SOL115
MGD51  SOL115
MGD51  SOL115

c) The third condition different in $1, the same in $2, so I add to "inner" 0 and to "outer" 3

MGD17  SOL4
MGD51  SOL4
MGD98  SOL4

Input example

MGD24 SOL6215
MGD25 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
MGD28 SOL7
MGD28 SOL6215

Expected output (inner in the first, outer in the second column)

4   5

Why this output? here 3 inner, 0 outer

MGD27 SOL93
MGD27 SOL93
MGD27 SOL93

here 1 inner 2 outer

MGD26 SOL7
MGD26 SOL7
...
MGD28 SOL7

here 0 inner 3 outer

MGD24 SOL6215
MGD25 SOL6215
...
MGD28 SOL6215

I try to write a script. I will do this on one hundred files. I stuck on these conditions I don't know how to implement them in the code. I know that I should process the file twice and in the second time compare my value

#!/bin/bash
for index in {1..100} # I do this script on 100 files, that is s why I use for loop
do
    awk 'NR==FNR         {a[$1,$2]++; s[$1,$2]++; next} 

how to write these conditions????
       END             {print inner,outer}' eq9_$index.ndx{,} >> inner_outer_water_bridges_x2.txt
done

Do you have any idea?

This is the answer - I adapted my script for working on 100 files

#!/bin/bash
for index in {1..100} # I do this script on 100 files, that is s why I use for loop
do
    sort -k2,2 -k1,1 eq9_x3_$index.ndx | 
    uniq -c             | 
    uniq -f2 -c         | 
    awk '$1>1{outer+=$1} $1<3{inner+=5-2*$1} END{print inner, outer}' >> inner_outer_water_bridges_x3.txt
done

I wrote a full explanation of the @karakfa script below Input data

MGD24 SOL6215
MGD25 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93
MGD28 SOL7
MGD28 SOL6215

First we sort our values. The primary key is in the 2nd column, secondary key is in the 1st column.

    sort -k2,2 -k1,1 file >> output.txt

so we get this when we run the script

MGD24 SOL6215
MGD25 SOL6215
MGD28 SOL6215
MGD26 SOL7
MGD26 SOL7
MGD28 SOL7
MGD27 SOL93
MGD27 SOL93
MGD27 SOL93

Then we count the same lines, write in the first column the number of lines that repeat, and left only unique lines

    sort -k2,2 -k1,1 file | 
    uniq -c  >> output.txt

Our output

      1 MGD24 SOL6215
      1 MGD25 SOL6215
      1 MGD28 SOL6215
      2 MGD26 SOL7
      1 MGD28 SOL7
      3 MGD27 SOL93

Then we count second column repetitions, write in the first column the number of repetitions and then delete lines with SOL repetitions

    sort -k2,2 -k1,1 eq9_x3_1.ndx | 
    uniq -c             | 
    uniq -f2 -c          >> output.txt

our output

      3       1 MGD24 SOL6215
      2       2 MGD26 SOL7
      1       3 MGD27 SOL93

Then we calculate the value. When we have in the first column value higher than 1, we add to outer value from the first column (so in our date we add 3 from the first row, first column and 2 from the second row first column, so our outer: 3+2 = 5. Then We check again the first column and if the value from the first column is lower than 3 we calculate, so for the second row first column we have 5-22 = 1, and for the third row, first column we have: 5-21 = 3 and our inner: 1 + 3 =4

question from:https://stackoverflow.com/questions/65862342/counting-depends-on-values-in-the-column-in-awk

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-03-06T05:08:01+0000

$ sort -k2 -k1,1 file | 
  uniq -c             | 
  uniq -f2 -c         | 
  awk '$1>1{outer+=$1} $1<3{inner+=5-2*$1} END{print inner, outer}'


4 5

explanation is left as an exercise...

Categories

Counting depends on values in the column in awk

Counting depends on values in the column in awk

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags