Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
380 views
in Technique[技术] by (71.8m points)

dplyr - Combining filter, across, and starts_with to string search across columns in R

This is very similar to the answer given here, but I cannot figure out why starts_with does not work:

diamonds %>% 
    filter(across(clarity, ~ grepl('^S', .))) %>% 
    head

# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33
diamonds %>%
  filter(across(starts_with("c"),~grepl("^S" ,.))) %>% 
  head

# A tibble: 0 x 10
# ... with 10 variables: carat <dbl>, cut <ord>, color <ord>, clarity <ord>, depth <dbl>, table <dbl>,
#   price <int>, x <dbl>, y <dbl>, z <dbl>
question from:https://stackoverflow.com/questions/66052130/combining-filter-across-and-starts-with-to-string-search-across-columns-in-r

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

dplyr before 1.0.4

diamonds %>%
  filter(rowSums(across(starts_with("c"),~grepl("^S" ,.))) > 0) 
# A tibble: 22,259 x 10
#    carat cut       color clarity depth table price     x     y     z
#    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#  3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#  4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#  5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
#  6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33
#  7  0.31 Ideal     J     SI2      62.2    54   344  4.35  4.37  2.71
#  8  0.2  Premium   E     SI2      60.2    62   345  3.79  3.75  2.27
#  9  0.3  Ideal     I     SI2      62      54   348  4.31  4.34  2.68
# 10  0.3  Good      J     SI1      63.4    54   351  4.23  4.29  2.7 
# # ... with 22,249 more rows

How to figure this out or confirm it:

diamonds %>%
  filter({browser(); across(starts_with("c"),~grepl("^S" ,.)); })
# Called from: mask$eval_all_filter(dots, env_filter)
# debug at #1: across(starts_with("c"), ~grepl("^S", .))

across(starts_with("c"), ~ grepl("^S" , .))
# # A tibble: 53,940 x 4
#    carat cut   color clarity
#    <lgl> <lgl> <lgl> <lgl>  
#  1 FALSE FALSE FALSE TRUE   
#  2 FALSE FALSE FALSE TRUE   
#  3 FALSE FALSE FALSE FALSE  
#  4 FALSE FALSE FALSE FALSE  
#  5 FALSE FALSE FALSE TRUE   
#  6 FALSE FALSE FALSE FALSE  
#  7 FALSE FALSE FALSE FALSE  
#  8 FALSE FALSE FALSE TRUE   
#  9 FALSE FALSE FALSE FALSE  
# 10 FALSE FALSE FALSE FALSE  
# # ... with 53,930 more rows

To me, it seems apparent that one would want any row with at least one TRUE (or perhaps all, but I'll assume "any" for now). Since this is a frame of logicals, we can use rowSums, which should sum falses as 0 and trues as 1, so

head(rowSums(across(starts_with("c"), ~ grepl("^S" , .))) > 0)
# [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE

which is a single vector of logicals, one per row, which is what dplyr::filter ultimately wants/needs.

dplyr since 1.0.4

See https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

diamonds %>%
  filter(if_any(across(starts_with("c"),~grepl("^S" ,.)))) 

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...