dataframe - Use of mutate in Summarise function using R

Question

Welcome To Ask or Share your Answers For Others

dataframe - Use of mutate in Summarise function using R

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

dataframe - Use of mutate in Summarise function using R

I have a dataframe like as shown below

identifier date       from       to         type  shift_back_max shift_forward_max
   <chr>      <date>     <date>     <date>     <chr>          <dbl>             <dbl>
   11         2011-12-31 2011-01-01 2011-12-31 last             364                 0
   11         2009-07-11 2009-01-01 2009-12-31 last             191               173
   11         NA         NA         NA         last              NA                NA
   11         2013-05-21 2013-01-01 2013-12-31 last             140               224
   11         2017-06-06 2017-01-01 2017-12-31 last             156               208
   12         2014-04-03 2014-01-01 2014-12-31 NA                92               272
   12         2016-08-04 2016-01-01 2016-12-31 NA               216               149
   12         2014-03-05 2014-01-01 2014-12-31 NA                63               301
   13         2011-02-07 2011-01-01 2011-12-31 NA                37               327
   14         2014-04-04 2014-01-01 2014-12-31 first             93               271
   14         2011-01-01 2011-01-01 2011-12-31 first              0               364
   14         2016-06-21 2016-01-01 2016-12-31 first            172               193
   16         NA         NA         NA         NA                NA                NA
   17         NA         NA         NA         NA                NA                NA
   18         NA         NA         NA         NA                NA                NA
   19         NA         NA         NA         NA                NA                NA

I am trying the below scenarios

Scenario - 1 (using mutate in across stmt)

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
            shift_forward_max = min(shift_forward_max, na.rm = TRUE),
            mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))

Scenario - 2 (without using mutate in across stmt)

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
            shift_forward_max = min(shift_forward_max, na.rm = TRUE),
            across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x)))

Both scenarios produce the same output as shown below. So what's the use of mutate stmt in across stmt? Can you let me know whether it is a bad programming practice or it will produce incorrect output in any specific case? I use across stmt to replace -Inf with -30 and Inf with 30. I already adopted scenario 2 to my data of several million records and did this. Do I have to rerun again as it might have incorrect output or its just a bad programming practice?

which of the two scenarios is the correct one? does it mean other scenarios can produce incorrect output? can help me, please?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T21:17:17+0000

I find the use of mutate inside summarize very confusing, and don't really know what to expect of it (I'm honestly surprised it even works). If I understand correctly, what you want to do is best expressed as (Scenario - 3):

data %>%
   group_by(identifier) %>%
   summarize(shift_back_max = - min(shift_back_max, na.rm = TRUE),
             shift_forward_max = min(shift_forward_max, na.rm = TRUE)) %>%
   ungroup() %>%
   mutate(across(starts_with("shift"), ~ ifelse(is.infinite(.x), 30 * sign(.x), .x))))

(meaning you first summarize by identifier, then you apply a treatment to the whole result)

You can compare results of the different approaches with all.equal(). I'd expect all these approaches to give the same result, but not to be as clear to the reader.

Categories

dataframe - Use of mutate in Summarise function using R

dataframe - Use of mutate in Summarise function using R

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags