Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
274 views
in Technique[技术] by (71.8m points)

r - Handle a table with ID repetition

I am not a R beginner but I really am having a hard time to solve my problem. My question is this : I have a data frame (here is an exemple).

id name dateA 
1   A   150
1   A   160
2   B   110
2   B   1009
2   B   098
2   B   309
3   C   218
3   C   310
4   D   219

I would like to create 3 new columns (minA, maxA, repA)

minA == min(of dateA for each id)
maxA == max(of dateA for each id)
repA == number of repetition for each id;


id name dateA minA maxA repA
1   A   150
1   A   160
2   B   110
2   B   1009
2   B   098
2   B   309
3   C   218
3   C   310
4   D   219

Thanks you for your help. Hope I am clear enough.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could try

library(data.table)#v1.9.5+
setDT(df1)[,c('minA', 'maxA', 'repA') := list(min(dateA), max(dateA), 
                    .N) , by= id]

Update

For the updated dataset, we create the columns 'minA', 'maxA', 'repA' as before ie. by assigning (:=) to the min(dateA), max(dateA) and .N grouped by 'id'. Set the key column as 'id' (setkey(.., id)), join with the output obtained from reshaping 'long' to 'wide' format (dcast(df2, ..))

  setkey(setDT(df2)[, c('minA', 'maxA', 'repA') := list(min(dateA),
        max(dateA), .N) , by= id], id)[
          dcast(df2, id~typeP, value.var='typeP', length)]
  #    id name dateA typeP minA maxA repA P1 P2 P3
  #1:  1    A   150    P1  150  160    2  2  0  0
  #2:  1    A   160    P1  150  160    2  2  0  0
  #3:  2    B   110    P2   98 1009    4  1  3  0
  #4:  2    B  1009    P2   98 1009    4  1  3  0
  #5:  2    B    98    P1   98 1009    4  1  3  0
  #6:  2    B   309    P2   98 1009    4  1  3  0
  #7:  3    C   218    P2  218  310    2  0  1  1
  #8:  3    C   310    P3  218  310    2  0  1  1
  #9:  4    D   219    P1  219  219    1  1  0  0

data

df1 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 4L),
 name = c("A", 
"A", "B", "B", "B", "B", "C", "C", "D"), dateA = c(150L, 160L, 
110L, 1009L, 98L, 309L, 218L, 310L, 219L)), .Names = c("id", 
"name", "dateA"), class = "data.frame", row.names = c(NA, -9L))

df2 <- structure(list(id = c(1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 4L), 
 name = c("A", 
"A", "B", "B", "B", "B", "C", "C", "D"), dateA = c(150L, 160L, 
110L, 1009L, 98L, 309L, 218L, 310L, 219L), typeP = c("P1", "P1", 
"P2", "P2", "P1", "P2", "P2", "P3", "P1")), .Names = c("id", 
"name", "dateA", "typeP"), class = "data.frame",
 row.names = c(NA, -9L))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...