I would like to know how to do this operation simpler.
Imagine I have a data.frame like this one:
set.seed(1)
ID <- rep(1:3,each=4)
XX <- round(runif(12),3)
TT <- rep(1:4, 3)
ZZ <- ave(XX*TT,ID, FUN = cumsum)
DF <- data.frame(ID, XX, ZZ)
ID TT XX ZZ
1 1 0.266 0.266
1 2 0.372 1.010
1 3 0.573 2.729
1 4 0.908 6.361
2 1 0.202 0.202
2 2 0.898 1.998
2 3 0.945 4.833
2 4 0.661 7.477
3 1 0.629 0.629
3 2 0.062 0.753
3 3 0.206 1.371
3 4 0.177 2.079
I' would like to get, for each column, the increments (differences between two consecutive elements) by groups of ID. Keeping the first one (as if there is a previous zero).
ID TT XX ZZ
1 1 0.266 0.266
1 2 0.106 0.744
1 3 0.201 1.719
1 4 0.335 3.632
2 1 0.202 0.202
2 2 0.696 1.796
2 3 0.047 2.835
2 4 -0.284 2.644
3 1 0.629 0.629
3 2 -0.567 0.124
3 3 0.144 0.618
3 4 -0.029 0.708
I've tried with
ave(DF[3:4],DF$ID,FUN=function(x) diff(c(0,x)))
but it doesn't work, it produces the error:
Error in r[i1] - r[-length(r):-(length(r) - lag + 1L)] :
non-numeric argument to binary operator
Isn't there an easy way to do it?
I've found that I can get the proper output with:
ave(DF[3:4],DF$ID,FUN=function(x)
sapply(x, FUN=function(y) diff(c(0,y))))
but it gets quite long and complex for a so simple operation.
I've found that I can also do it by using data.table but I prefer to be able to do it with base R.
setDT(DF)
DF[, lapply(.SD, FUN=function(x) diff(c(0,x)) ), keyby = ID ]
I also don't know how to insert a new row (plenty of zeroes) at the beginning of each group or given some condition.
ID XX ZZ
1 0 0
1 0.266 0.266
1 0.372 1.010
1 0.573 2.729
1 0.908 6.361
2 0 0
2 0.202 0.202
2 0.898 1.998
2 0.945 4.833
2 0.661 7.477
3 0 0
3 0.629 0.629
3 0.062 0.753
3 0.206 1.371
3 0.177 2.079
I've tried with:
ave(DF[3:4],DF$ID,FUN=function(x) sapply(x, FUN=function(y) (c(0,y))))
warning:
data length [10] is not a sub-multiple or multiple of the number of
rows [4]
I guess the general way to do it would be working with indexes of the rows.
PD: I've updated the post.
Trying to do it simpler I had removed the TT column but I have leater noticed that is important.
My solution assumes that the table is ordered by TT, but sometimes it's not like that.
What I really want is:
XX1
XX2-XX1
XX3-XX2
XX4-XX3
Where we get the subindexes not from the position on the table but from T.
I don't know whether is more effcicient to do it by first sorting the columns by TT or by creating a paste() syntax.
See Question&Answers more detail:
os