Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
392 views
in Technique[技术] by (71.8m points)

shell - 在第二列中找到在第一列中进行选择的概率(Find the probability in 2nd column for a selection in 1st column)

I have two columns as follows

(我有两列如下)

ifile.dat
1   10
3   34
1   4
3   32
5   3
2   2
4   20
3   13
4   50
1   40
2   20
5   2

I would like to calculate the probability in 2nd column for some selection in 1st column.

(我想在第二列中为第一列中的某些选择计算概率。)

ofile.dat
1-2   0.417 #Here 1-2 means all values in 1st column ranging from 1 to 2; 
            #0.417 is the probability of corresponding values in 2nd column 
            # i.e. count(10,4,2,40,20)/total = 5/12 
3-4   0.417 #count(34,32,20,13,50)/total = 5/12
5-6   0.167 #count(3,2)/total = 2/12

Similarly if I choose the range of selection with 3 number, then the desire output will be

(同样,如果我用3个数字选择选择范围,则期望输出将是)

ofile.dat
1-3  0.667
4-6  0.333

RavinderSingh13 and James Brown had given nice scripts (see answer), but these are not working for lager values than 10 in 1st column.

(RavinderSingh13和James Brown给出了不错的脚本(请参阅答案),但这些脚本不适用于第一列中大于10的较大值。)

ifile2.txt
10   10
30   34
10   4
30   32
50   3
20   2
40   20
30   13
40   50
10   40
20   20
50   2

~

(?)

  ask by Kay translate from so

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

EDIT2: Considering OP's edited samples could you please try following.

(EDIT2:考虑到OP的编辑示例,您可以尝试以下操作。)

I have tested it successfully with OP's 1st and latest edit samples and it worked perfectly fine with both of them.

(我已经使用OP的第一个和最新的编辑示例成功地对其进行了测试,并且它们在两个示例中的运行都非常完美。)

Also one more thing, I made this solution such that a "corner case" where range could leave printing elements in case it is NOT crossing range value at last lines.

(还有另外一件事,我提出了这样的解决方案,以使“拐角情况”在不超过最后一行的范围值的情况下,范围可能会离开打印元素。)

Like OP's 1st sample where range=2 but max value is 5 so it will NOT leave 5 in here.

(类似于OP的第一个样本,其中range=2但最大值为5因此此处不会留下5。)

sort -n Input_file |
awk -v range="2" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i++){
       num+=split(d[i],array," ")
       if(++j==range){
          start=start?start:1
          printf("%s-%s %.02f
",start,i,num/tot_element)
          start=i+1
          j=num=""
          delete array
       }
       if(j!="" && i==till){
          printf("%s-%s %.02f
",start,i,num/tot_element)
       }
    }
  }
'

Output will be as follows.

(输出如下。)

1-10 0.25
11-20 0.17
21-30 0.25
31-40 0.17
41-50 0.17


EDIT: In case your Input_file don't have 2nd column then try following.

(编辑:如果您的Input_file没有第二列,然后尝试以下操作。)

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$0
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'


Could you please try following, written and tested with shown samples.

(您能否请尝试按照显示的示例进行后续操作,编写并进行测试。)

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'


In case you don't have to include any 0 value then try following.

(如果您不必包含任何0值,请尝试执行以下操作。)

sort -k1 Input_file |
awk -v range="1" '
  !b[$1]++{
    c[++count]=$1
  }
  {
    d[$1]=(d[$1]!=0?d[$1] OFS:"")$2
    tot_element++
    till=$1
  }
  END{
    for(i=1;i<=till;i+=(range+1)){
       for(j=i;j<=i+range;j++){
          num=split(d[c[j]],array," ")
          total+=num
       }
       print i"-"i+range,tot_element?total/tot_element:0
       total=num=""
    }
  }
'

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...