I need to speed up the nested loop below. Scores linked to item IDs are recorded by date. For each item with multiple scores, I need to relate the scores and the time distance between them. On toy data like that below, it works fine, but when the test data is replaced with data that is tens of thousands of rows, it becomes too slow to be useful. Are there better ways to do the same?
# create some simulated data
test <- matrix(1:18, byrow=TRUE, nrow=6)
test[,1] <- c(1,2,1,3,2,3)
test[,2] <- c(70,92,62,90,85,82)
test[,3] <- c("2019-01-01","2019-01-01", "2020-01-01", "2019-01-01", "2020-01-01", "2020-01-01")
colnames(test) <- c("ID", "Score", "Date")
test <- data.frame(test)
test$Date <- as.Date(test$Date)
# create a dataframe to hold all the post-loop data
df <- data.frame(matrix(ncol = 4, nrow = 0))
col_names <- c("ID", "Years", "BeginScore", "EndScore")
# get all the unique item IDs
ids <- unique(test$ID)
# loop through each unique item id
for(i in 1:length(ids))
{
# get all the instances of that single item
item <- test[test$ID == ids[i],]
# create a matrix to hold the data
scores <- data.frame(matrix(1:((nrow(item)-1)*4), byrow=TRUE, nrow=nrow(item)-1))
colnames(scores) <- col_names
# create an index, starting at the last (bc real data is ordered by data)
index <- nrow(item)
# loop through the list of instances of the sigle item and assign info
for(j in 1:(nrow(item)-1))
{
scores$Years <- time_length(item[index,3]-item[(index -1),3], "years")
scores$BeginScore <- item[(index-1),2]
scores$EndScore <- item[index, 2]
scores$ID <- item[index,1]
index <- index - 1
}
# bind the single item to the collected data and then loop to next unique item
df <- rbind(df, scores)
}
question from:
https://stackoverflow.com/questions/65877428/need-to-speed-up-r-loop 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…