dplyr before 1.0.4
diamonds %>%
filter(rowSums(across(starts_with("c"),~grepl("^S" ,.))) > 0)
# A tibble: 22,259 x 10
# carat cut color clarity depth table price x y z
# <dbl> <ord> <ord> <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
# 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
# 3 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
# 4 0.26 Very Good H SI1 61.9 55 337 4.07 4.11 2.53
# 5 0.3 Good J SI1 64 55 339 4.25 4.28 2.73
# 6 0.22 Premium F SI1 60.4 61 342 3.88 3.84 2.33
# 7 0.31 Ideal J SI2 62.2 54 344 4.35 4.37 2.71
# 8 0.2 Premium E SI2 60.2 62 345 3.79 3.75 2.27
# 9 0.3 Ideal I SI2 62 54 348 4.31 4.34 2.68
# 10 0.3 Good J SI1 63.4 54 351 4.23 4.29 2.7
# # ... with 22,249 more rows
How to figure this out or confirm it:
diamonds %>%
filter({browser(); across(starts_with("c"),~grepl("^S" ,.)); })
# Called from: mask$eval_all_filter(dots, env_filter)
# debug at #1: across(starts_with("c"), ~grepl("^S", .))
across(starts_with("c"), ~ grepl("^S" , .))
# # A tibble: 53,940 x 4
# carat cut color clarity
# <lgl> <lgl> <lgl> <lgl>
# 1 FALSE FALSE FALSE TRUE
# 2 FALSE FALSE FALSE TRUE
# 3 FALSE FALSE FALSE FALSE
# 4 FALSE FALSE FALSE FALSE
# 5 FALSE FALSE FALSE TRUE
# 6 FALSE FALSE FALSE FALSE
# 7 FALSE FALSE FALSE FALSE
# 8 FALSE FALSE FALSE TRUE
# 9 FALSE FALSE FALSE FALSE
# 10 FALSE FALSE FALSE FALSE
# # ... with 53,930 more rows
To me, it seems apparent that one would want any row with at least one TRUE
(or perhaps all, but I'll assume "any" for now). Since this is a frame of logicals, we can use rowSums
, which should sum falses as 0 and trues as 1, so
head(rowSums(across(starts_with("c"), ~ grepl("^S" , .))) > 0)
# [1] TRUE TRUE FALSE FALSE TRUE FALSE
which is a single vector of logicals, one per row, which is what dplyr::filter
ultimately wants/needs.
dplyr since 1.0.4
See https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
diamonds %>%
filter(if_any(across(starts_with("c"),~grepl("^S" ,.))))
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…