dplyr - Combining filter, across, and starts_with to string search across columns in R

Question

Welcome To Ask or Share your Answers For Others

dplyr - Combining filter, across, and starts_with to string search across columns in R

posted Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

dplyr - Combining filter, across, and starts_with to string search across columns in R

This is very similar to the answer given here, but I cannot figure out why starts_with does not work:

diamonds %>% 
    filter(across(clarity, ~ grepl('^S', .))) %>% 
    head

# A tibble: 6 x 10
  carat cut       color clarity depth table price     x     y     z
  <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33

diamonds %>%
  filter(across(starts_with("c"),~grepl("^S" ,.))) %>% 
  head

# A tibble: 0 x 10
# ... with 10 variables: carat <dbl>, cut <ord>, color <ord>, clarity <ord>, depth <dbl>, table <dbl>,
#   price <int>, x <dbl>, y <dbl>, z <dbl>

question from:https://stackoverflow.com/questions/66052130/combining-filter-across-and-starts-with-to-string-search-across-columns-in-r

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T03:10:14+0000

dplyr before 1.0.4

diamonds %>%
  filter(rowSums(across(starts_with("c"),~grepl("^S" ,.))) > 0) 
# A tibble: 22,259 x 10
#    carat cut       color clarity depth table price     x     y     z
#    <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#  1  0.23 Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
#  2  0.21 Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
#  3  0.31 Good      J     SI2      63.3    58   335  4.34  4.35  2.75
#  4  0.26 Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
#  5  0.3  Good      J     SI1      64      55   339  4.25  4.28  2.73
#  6  0.22 Premium   F     SI1      60.4    61   342  3.88  3.84  2.33
#  7  0.31 Ideal     J     SI2      62.2    54   344  4.35  4.37  2.71
#  8  0.2  Premium   E     SI2      60.2    62   345  3.79  3.75  2.27
#  9  0.3  Ideal     I     SI2      62      54   348  4.31  4.34  2.68
# 10  0.3  Good      J     SI1      63.4    54   351  4.23  4.29  2.7 
# # ... with 22,249 more rows

How to figure this out or confirm it:

diamonds %>%
  filter({browser(); across(starts_with("c"),~grepl("^S" ,.)); })
# Called from: mask$eval_all_filter(dots, env_filter)
# debug at #1: across(starts_with("c"), ~grepl("^S", .))

across(starts_with("c"), ~ grepl("^S" , .))
# # A tibble: 53,940 x 4
#    carat cut   color clarity
#    <lgl> <lgl> <lgl> <lgl>  
#  1 FALSE FALSE FALSE TRUE   
#  2 FALSE FALSE FALSE TRUE   
#  3 FALSE FALSE FALSE FALSE  
#  4 FALSE FALSE FALSE FALSE  
#  5 FALSE FALSE FALSE TRUE   
#  6 FALSE FALSE FALSE FALSE  
#  7 FALSE FALSE FALSE FALSE  
#  8 FALSE FALSE FALSE TRUE   
#  9 FALSE FALSE FALSE FALSE  
# 10 FALSE FALSE FALSE FALSE  
# # ... with 53,930 more rows

To me, it seems apparent that one would want any row with at least one TRUE (or perhaps all, but I'll assume "any" for now). Since this is a frame of logicals, we can use rowSums, which should sum falses as 0 and trues as 1, so

head(rowSums(across(starts_with("c"), ~ grepl("^S" , .))) > 0)
# [1]  TRUE  TRUE FALSE FALSE  TRUE FALSE

which is a single vector of logicals, one per row, which is what dplyr::filter ultimately wants/needs.

dplyr since 1.0.4

See https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/

diamonds %>%
  filter(if_any(across(starts_with("c"),~grepl("^S" ,.))))

Categories

dplyr - Combining filter, across, and starts_with to string search across columns in R

dplyr - Combining filter, across, and starts_with to string search across columns in R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

dplyr before 1.0.4

dplyr since 1.0.4

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags