Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
490 views
in Technique[技术] by (71.8m points)

unix - Count number of points in 2D bins

I have a file with two columns, X and Y positive, and non-gridded, data points (> 10^5 points).

1     0.9
0.9   1.1
0.5   1.25
2.6   0.9
3.1   2.6
2.9   2.55
4.1   0.9
1.2   6
5.5   2.5
6     4
4     7.2
.     .
.     .

I would like to generate an X-Y grid (of size binsize) in a selected range of those points. Besides, I would like to add a third column indicating the count of the original data points contained in a square area (binsize x binsize) of each of the vertices of the grid.

If binsize=5

2.5    2.5   7 
2.5    7.5   2
7.5    2.5   2
.       .    .
.       .    .

I would like to pass to the AWK program the range of data and the binsize.

I would appreciate your help very much.

EDIT:

The binsize is to determine the range of values in which I have to count the XY datapoints. The range input is to select the x and y values to count, for example, If I select x in [0,5] and y in [0,5] then I only will consider the binning of the first eight xy points. My real dataset is very big

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I think a solution could look something like this:

awk -v binsize=0.5 -v xmin=0 -v xmax=3 -v ymin=2 -v ymax=4 '
BEGIN {
   # Calculate number of x-bins and number of y-bins
   nx=int((xmax-xmin)/binsize)
   ny=int((ymax-ymin)/binsize)
   # Pre-zero all bins else empty entries will not show up in output
   for(x=0;x<nx;x++){
      for(y=0;y<ny;y++){
         output[x,y]=0
      }
   }
}

{
   # pick up x and y
   x=$1; y=$2

   # if this sample within x-range and y-range
   if(x>=xmin && x<=xmax && y>=ymin && y<=ymax){
      xindex=int((x-xmin)/binsize)
      yindex=int((y-ymin)/binsize)
      output[xindex,yindex]++;
      printf("DEBUG: x=%f, y=%f (line %d)
",x,y,NR);
      printf("DEBUG: Incrementing bin [%d][%d]
",xindex,yindex);
   }
}

END{
   # Print results
   for(x=0;x<nx;x++){
      for(y=0;y<ny;y++){
         printf("%d	",output[x,y]);
      }
      printf("
");
   }
} ' points.txt

And using this as input:

0.4   2.1
0.39  2.02
0.1   2.4
1     0.9
0.9   1.1
0.5   1.25
2.6   0.9
3.1   2.6
2.9   2.55

You get this as output:

DEBUG: x=0.400000, y=2.100000 (line 1)
DEBUG: Incrementing bin [0][0]
DEBUG: x=0.390000, y=2.020000 (line 2)
DEBUG: Incrementing bin [0][0]
DEBUG: x=0.100000, y=2.400000 (line 3)
DEBUG: Incrementing bin [0][0]
DEBUG: x=2.900000, y=2.550000 (line 9)
DEBUG: Incrementing bin [5][1]
3   0   0   0   
0   0   0   0   
0   0   0   0   
0   0   0   0   
0   0   0   0   
0   1   0   0

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...