Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
517 views
in Technique[技术] by (71.8m points)

r - Automatically generate new variable names using dplyr mutate

I would like to create variable names dynamically while using dplyr; although, I’d be fine with a non-dplyr solution as well.

For Example:

data(iris)
library(dplyr) 

iris <- iris %>%
  group_by(Species) %>%
  mutate(
    lag_Sepal.Length = lag(Sepal.Length),
    lag_Sepal.Width  = lag(Sepal.Width),
    lag_Petal.Length = lag(Petal.Length)
  ) %>%
  ungroup

head(iris)

    Sepal.Length Sepal.Width Petal.Length Petal.Width Species lag_Sepal.Length lag_Sepal.Width
             (dbl)       (dbl)        (dbl)       (dbl)  (fctr)            (dbl)           (dbl)
    1          5.1         3.5          1.4         0.2  setosa               NA              NA
    2          4.9         3.0          1.4         0.2  setosa              5.1             3.5
    3          4.7         3.2          1.3         0.2  setosa              4.9             3.0
    4          4.6         3.1          1.5         0.2  setosa              4.7             3.2
    5          5.0         3.6          1.4         0.2  setosa              4.6             3.1
    6          5.4         3.9          1.7         0.4  setosa              5.0             3.6
    Variables not shown: lag_Petal.Length (dbl)

But, instead of doing this three times, I want to create 100 of these “lag” variables that take in the name: lag_original variable name. I’m trying to figure out how to do this without typing the new variable name 100 times, but I’m coming up short.

I’ve looked into this example and this example elsewhere on SO. They are similar, but I’m not quite able to piece together the specific solution I need. Any help is appreciated!

Edit
Thanks to @BenFasoli for the inspiration. I took his answer and tweaked it just a bit to get the solution I needed. I also used This RStudio Blog and This SO post. The "lag" in the variable name is trailing instead of leading, but I can live with that.

My final code is posted here in case it’s helpful to anyone else:

lagged <- iris %>%
  group_by(Species) %>%
  mutate_at(
    vars(Sepal.Length:Petal.Length),
    funs("lag" = lag)) %>%
  ungroup

# A tibble: 6 x 8
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_lag Sepal.Width_lag
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>            <dbl>           <dbl>
1          5.1         3.5          1.4         0.2  setosa               NA              NA
2          4.9         3.0          1.4         0.2  setosa              5.1             3.5
3          4.7         3.2          1.3         0.2  setosa              4.9             3.0
4          4.6         3.1          1.5         0.2  setosa              4.7             3.2
5          5.0         3.6          1.4         0.2  setosa              4.6             3.1
6          5.4         3.9          1.7         0.4  setosa              5.0             3.6
# ... with 1 more variables: Petal.Length_lag <dbl>
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a data.table approach. I chose columns with numbers in this case. What you want to do is to choose column names and create new column names in advance. Then, you apply shift(), which works like lag() and lead() in the dplyr package, to each of the columns you chose.

library(data.table)

# Crate a df for this demo.
mydf <- iris

# Choose columns that you want to apply lag() and create new colnames.
cols = names(iris)[sapply(iris, is.numeric)]
anscols = paste("lag_", cols, sep = "")

# Apply shift() to each of the chosen columns.
setDT(mydf)[, (anscols) := shift(.SD, 1, type = "lag"),
            .SDcols = cols]

     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species lag_Sepal.Length lag_Sepal.Width
 1:          5.1         3.5          1.4         0.2    setosa               NA              NA
 2:          4.9         3.0          1.4         0.2    setosa              5.1             3.5
 3:          4.7         3.2          1.3         0.2    setosa              4.9             3.0
 4:          4.6         3.1          1.5         0.2    setosa              4.7             3.2
 5:          5.0         3.6          1.4         0.2    setosa              4.6             3.1
 ---                                                                                             
146:          6.7         3.0          5.2         2.3 virginica              6.7             3.3
147:          6.3         2.5          5.0         1.9 virginica              6.7             3.0
148:          6.5         3.0          5.2         2.0 virginica              6.3             2.5
149:          6.2         3.4          5.4         2.3 virginica              6.5             3.0
150:          5.9         3.0          5.1         1.8 virginica              6.2             3.4
     lag_Petal.Length lag_Petal.Width
  1:               NA              NA
  2:              1.4             0.2
  3:              1.4             0.2
  4:              1.3             0.2
  5:              1.5             0.2
 ---                                 
146:              5.7             2.5
147:              5.2             2.3
148:              5.0             1.9
149:              5.2             2.0
150:              5.4             2.3

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...