Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.1k views
in Technique[技术] by (71.8m points)

r - How to subtract first entry from last entry in grouped data

I would appreciate some help with the following task: From the data frame below (C), for each id I would like to subtract the first entry under column d_2 from the final entry and then store the results in another dataframe containing the same ids. I can then merge this with my initial dataframe. Pls note that the subtraction has to be in this order (last entry minus first entry for each id).

Here are the codes:

id <- c("A1", "A1", "B10","B10", "B500", "B500", "C100", "C100", "C100", "D40", "D40", "G100", "G100")

d_1 <- c( rep(1.15, 2), rep(1.44, 2), rep(1.34, 2), rep(1.50, 3), rep(1.90, 2), rep(1.59, 2))

set.seed(2)

d_2 <- round(runif(13, -1, 1), 2)

C <- data.frame(id, d_1, d_2)

id   d_1   d_2
A1   1.15 -0.63
A1   1.15  0.40
B10  1.44  0.15
B10  1.44 -0.66
B500 1.34  0.89
B500 1.34  0.89
C100 1.50 -0.74
C100 1.50  0.67
C100 1.50 -0.06
D40  1.90  0.10
D40  1.90  0.11
G100 1.59 -0.52
G100 1.59  0.52

Desired result:

id2 <- c("A1", "B10", "B500", "C100", "D40", "G100")

difference <- c(1.03, -0.81, 0, 0.68, 0.01, 1.04)

diff_df <- data.frame(id2, difference)

id2    difference
A1        1.03
B10      -0.81
B500      0.00
C100      0.68
D40       0.01
G100      1.04

I attempted this by using ddply to obtain the first and last entries but I'm really struggling with indexing the "function argument" in the second code (below) to get the desired outcome.

C_1 <- ddply(C, .(id), function(x) x[c(1, nrow(x)), ])

ddply(C_1, .(patient), function )

To be honest, I'm not very familiar with the ddply package-I got the code above from another post on stack exchange .

My original data is a groupedData and I believe another way of approaching this is using gapply but again I'm struggling with the third argument here (usually a function)

grouped_C <- groupedData(d_1 ~ d_2 | id, data = C, FUN = mean, labels = list( x = "", y = ""), units = list(""))

x1 <- gapply(grouped_C, "d_2", first_entry)

x2 <- gapply(grouped_C, "d_2", last_entry)

where first_entry and last_entry are functions to help me get the first and and last entries. I can then get the difference with: x2 - x1. However, I'm not sure what to input as first_entry and last_entry in the above codes (perhaps to do with head or tail ?).

Any help would be much appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This can be done easily with dplyr. The last and first functions are very helpful for this task.

library(dplyr)               #install the package dplyr and load it into library 

diff_df <- C %>%             #create a new data.frame (diff_df) and store the output of the following operation in it. The %.% operator is used to chain several operations together but you dont have to reference the data.frame you are using each time. so here we are using your data.frame C for the following steps
  group_by(id) %>%            #group the whole data.frame C by id
  summarize(difference = last(d_2)-first(d_2))     #for each group of id, create a single line summary where the first entry of d_2 (for that group) is subtracted from the last entry of d_2 for that group

#    id difference             #this is the result stored in diff_df
#1   A1       1.03
#2  B10      -0.81
#3 B500       0.00
#4 C100       0.68
#5  D40       0.01
#6 G100       1.04

Edit note: updated post with %>% instead of %.% which is deprecated.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...