Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
397 views
in Technique[技术] by (71.8m points)

r - Adding missing dates to dataframe

I have a data frame which looks like this:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-09 20:00:00        0.13
4   2013-07-10 20:00:00        0.12
5   2013-07-11 20:00:00        0.03
6   2013-07-14 20:00:00        0.06
7   2013-07-15 20:00:00        0.08
8   2013-07-16 20:00:00        0.07
9   2013-07-17 20:00:00        0.08

There are a few dates missing from the data, and I would like to insert them and to carry over the value from the previous day into these new rows, i.e. obtain this:

    times                      values
1   2013-07-06 20:00:00        0.02
2   2013-07-07 20:00:00        0.03
3   2013-07-08 20:00:00        0.03
4   2013-07-09 20:00:00        0.13
5   2013-07-10 20:00:00        0.12
6   2013-07-11 20:00:00        0.03
7   2013-07-12 20:00:00        0.03
8   2013-07-13 20:00:00        0.03
9   2013-07-14 20:00:00        0.06
10  2013-07-15 20:00:00        0.08
11  2013-07-16 20:00:00        0.07
12  2013-07-17 20:00:00        0.08
...

I have been trying to use a vector of all the dates:

dates <- as.Date(1:length(df),origin = df$times[1])

I am stuck, and can't find a way to do it without a horrible for loop in which I'm getting lost... Thank you for your help

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Some test data (I am using Date, yours seems to be a different type, but this does not affect the algorithm):

data = data.frame(dates = as.Date(c("2011-12-15", "2011-12-17", "2011-12-19")), 
                  values = as.double(1:3))

# Generate **all** timestamps at which you want to have your result. 
# I use `seq`, but you may use any other method of generating those timestamps. 

alldates = seq(min(data$dates), max(data$dates), 1)

# Filter out timestamps that are already present in your `data.frame`:
# Construct a `data.frame` to append with missing values:
dates0 = alldates[!(alldates %in% data$dates)]
data0 = data.frame(dates = dates0, values = NA_real_)

# Append this `data.frame` and resort in time:
data = rbind(data, data0)
data = data[order(data$dates),]

# forward fill the values 
# I would recommend to move this code into a separate `ffill` function: 
# proved to be very useful in general):
current = NA_real_
data$values = sapply(data$values, function(x) { 
           current <<- ifelse(is.na(x), current, x); current })

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...