I have a dataframe like as shown below
identifier date from to type shift_back_max shift_forward_max
<chr> <date> <date> <date> <chr> <dbl> <dbl>
11 2011-12-31 2011-01-01 2011-12-31 last 364 0
11 2009-07-11 2009-01-01 2009-12-31 last 191 173
11 NA NA NA last NA NA
11 2013-05-21 2013-01-01 2013-12-31 last 140 224
11 2017-06-06 2017-01-01 2017-12-31 last 156 208
12 2014-04-03 2014-01-01 2014-12-31 NA 92 272
12 2016-08-04 2016-01-01 2016-12-31 NA 216 149
12 2014-03-05 2014-01-01 2014-12-31 NA 63 301
13 2011-02-07 2011-01-01 2011-12-31 NA 37 327
14 2014-04-04 2014-01-01 2014-12-31 first 93 271
14 2011-01-01 2011-01-01 2011-12-31 first 0 364
14 2016-06-21 2016-01-01 2016-12-31 first 172 193
16 NA NA NA NA NA NA
17 NA NA NA NA NA NA
18 NA NA NA NA NA NA
19 NA NA NA NA NA NA
I am trying the below scenarios
Scenario - 1 (using mutate in across stmt)
data %>%
group_by(identifier) %>%
summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
shift_forward_max = min(shift_forward_max, na.rm = TRUE),
mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))
Scenario - 2 (without using mutate in across stmt)
data %>%
group_by(identifier) %>%
summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
shift_forward_max = min(shift_forward_max, na.rm = TRUE),
across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x)))
Both scenarios produce the same output as shown below. So what's the use of mutate stmt in across stmt? Can you let me know whether it is a bad programming practice or it will produce incorrect output in any specific case? I use across stmt
to replace -Inf
with -30
and Inf
with 30
. I already adopted scenario 2 to my data of several million records and did this. Do I have to rerun again as it might have incorrect output or its just a bad programming practice?
which of the two scenarios is the correct one? does it mean other scenarios can produce incorrect output? can help me, please?
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…