r - data.table alternative for dplyr case_when

Question

Welcome To Ask or Share your Answers For Others

r - data.table alternative for dplyr case_when

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - data.table alternative for dplyr case_when

Some time ago they introduced a nice SQL-like alternative to ifelse within dplyr, i.e. case_when.

Is there an equivalent in data.table that would allow you to specify different conditions within one [] statement, without loading additional packages?

Example:

library(dplyr)

df <- data.frame(a = c("a", "b", "a"), b = c("b", "a", "a"))

df <- df %>% mutate(
    new = case_when(
    a == "a" & b == "b" ~ "c",
    a == "b" & b == "a" ~ "d",
    TRUE ~ "e")
    )

  a b new
1 a b   c
2 b a   d
3 a a   e

It would certainly be very helpful and make code much more readable (one of the reasons why I keep using dplyr in these cases).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T17:45:37+0000

FYI, a more recent answer for those coming across this post 2019. data.table versions above 1.13.0 have the fcase function that can be used. Note that it is not a drop-in replacement for dplyr::case_when as the syntax is different, but will be a "native" data.table way of calculation.

# Lazy evaluation
x = 1:10
data.table::fcase(
    x < 5L, 1L,
    x >= 5L, 3L,
    x == 5L, stop("provided value is an unexpected one!")
)
# [1] 1 1 1 1 3 3 3 3 3 3

dplyr::case_when(
    x < 5L ~ 1L,
    x >= 5L ~ 3L,
    x == 5L ~ stop("provided value is an unexpected one!")
)
# Error in eval_tidy(pair$rhs, env = default_env) :
#  provided value is an unexpected one!

# Benchmark
x = sample(1:100, 3e7, replace = TRUE) # 114 MB
microbenchmark::microbenchmark(
dplyr::case_when(
  x < 10L ~ 0L,
  x < 20L ~ 10L,
  x < 30L ~ 20L,
  x < 40L ~ 30L,
  x < 50L ~ 40L,
  x < 60L ~ 50L,
  x > 60L ~ 60L
),
data.table::fcase(
  x < 10L, 0L,
  x < 20L, 10L,
  x < 30L, 20L,
  x < 40L, 30L,
  x < 50L, 40L,
  x < 60L, 50L,
  x > 60L, 60L
),
times = 5L,
unit = "s")
# Unit: seconds
#               expr   min    lq  mean   median    uq    max neval
# dplyr::case_when   11.57 11.71 12.22    11.82 12.00  14.02     5
# data.table::fcase   1.49  1.55  1.67     1.71  1.73   1.86     5

Source, data.table NEWS for 1.13.0, released (24 Jul 2020).

Categories

r - data.table alternative for dplyr case_when

r - data.table alternative for dplyr case_when

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags