Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
344 views
in Technique[技术] by (71.8m points)

r - Using summarise() to count the number of times the min value is repeated

I have this reach data frame with ordered values and Reachability and my desired output is a summary table of several properties grouped by Cluster. The entire table contains more values but I think 10 rows are more than enough to explain what I want to achieve.

# A tibble: 500 x 3
  Order Reachability Cluster
   <int>        <dbl>   <dbl>
 1     1       NA           1
 2     2        1.54        1
 3     3        1.54        1
 4     4        0.860       1
 5     5        0.821       1
 6     6        0.821       1
 7     7        0.821       1
 8     8        0.821       1
 9     9        0.821       1
10    10        0.821       1
# ... with 490 more rows

I create my summary table with some position information about my reach table.

reach %>% dplyr::group_by(Cluster) %>% 
    summarise(first_value = first(na.omit(Reachability)),
              min_value = min(na.omit(Reachability)),
              last_value = last(na.omit(Reachability)),
              first_pos = first(Order),
              min_pos = Order[which.min(Reachability)],
              last_pos = last(Order))

# A tibble: 1 x 7
  Cluster first_value min_value last_value first_pos min_pos last_pos
    <dbl>       <dbl>     <dbl>      <dbl>     <int>   <int>   <int>
1       1       1.54      0.821      0.821       1       5      10

What I'm having trouble with is a command inside summarise that allows me to count the number of times that "min_value" repeats. In this case, for 0.821 the "min_value" should be 6. This is what I've tried with no success:

... %>% 
summarise(...
          ...
          N_min = sum(Reachability == min(na.omit(Reachability))))

... %>% 
summarise(...
          ...
          N_min = count(min(na.omit(Reachability))))

Am I missing something? I really have no idea why does my first option not work. From what I understand if I make that sum, performed by groups, should give me a sum of TRUE's (or 1's) that meet my condition. Thanks!

Data:

reach <- structure(list(Order = 1:10, Reachability = c(NA, 1.53995982068778, 
1.53995982068778, 0.860332791733694, 0.820585921380499, 0.820585921380499, 
0.820585921380499, 0.820585921380499, 0.820585921380499, 0.820585921380499
), Cluster = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))
question from:https://stackoverflow.com/questions/65624243/using-summarise-to-count-the-number-of-times-the-min-value-is-repeated

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Your first option should ideally work but again floating point comparisons are not accurate. (Ref Why are these numbers not equal?)

Try rounding the numbers before using sum.

summarise(
  ...
  N_min = sum(round(Reachability, 2) == round(min(Reachability,na.rm = TRUE), 2))
  ...
)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...