Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
847 views
in Technique[技术] by (71.8m points)

r - Averaging daily data into weekly data

I am wondering if there is a way to average daily data into weekly data. The dataframe that I call CADaily looks like this:

      > CADaily[1:10, ]
          Climate_Division       Date      Rain
      885                1 1948-07-01 0.8750000
      892                1 1948-07-02 2.9166667
      894                1 1948-07-03 0.7916667
      895                1 1948-07-04 0.4305556
      898                1 1948-07-05 0.8262061
      901                1 1948-07-06 0.5972222
      904                1 1948-07-17 0.04166667
      905                1 1948-07-18 0.08333333
      907                1 1948-07-20 0.04166667
      909                1 1948-07-22 0.12500000
      910                1 1948-07-21 NA

My objective is similar to the aggregate function to find the average of the daily rain into weekly rain values base on the Date (of course) and the Climate_Division (ranges from 1 to 7). I was searching online and I came across a code that I was able to use but wasn't quite to par to my objective:

      apply.weekly(xts(CADaily[,-2], order.by= CADaily[,2]), FUN = mean)

This does what I would like it to do, however my column Climate_Division is also averaged. I would simply like to average Rain only and order it based on the Climate_Division followed by Date. Is there a way that I can possibly do it as follows:

      aggregate(CADaily, by =list(CADaily$Climate_Division, CADaily$Date), FUN = mean, na.rm = TRUE)

where Date is in some form of weeks? Or is there another way?


EDIT:

Dear All,

Thank you for your help. Perhaps using aggregate wasn't the best way to go about this as I originally thought. In terms of output, I wanted to obtain the weekly average of rain throughout the years of the data (1948 - 1995). In other words, I wanted to get a nice format that I can input into a time series that has the form of the date of the end of the week. The output that I am looking for (keeping in mind that there may exist NA values) is:

      Climate_Division     Date          Rain
      1                    1948-07-03    1.527778
      1                    1948-07-10    0.6179946
      1                    1948-07-17    0.04166667
      1                    1948-07-24    0.08333333
      ...
      1                    1995-12-23    0.24513245
      1                    1995-12-30    0.12450545

Or is there a better way of expressing weekly data that is represented by dates?

Thank you for your help.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Updated answer

Based on the OP's update on the request, I modified the code to aggregate the data over the date of a defined day of each week (Saturday). This time I only use functions available in base R. It ignores NAs (if there are only NAs for a given End_of_Week-Climate_Division you get NaN, not a number).

# Data with another Climate division as example (same daily values and dates)
CADaily <-
structure(list(Climate_Division = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = structure(c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 10L), .Label = c("01/07/1948", "02/07/1948", "03/07/1948", 
"04/07/1948", "05/07/1948", "06/07/1948", "17/07/1948", "18/07/1948", 
"20/07/1948", "22/07/1948"), class = "factor"), Rain = c(0.875, 
2.9166667, 0.7916667, 0.4305556, 0.8262061, 0.5972222, 0.04166667, 
0.08333333, 0.04166667, 0.125, 0.875, 2.9166667, 0.7916667, 0.4305556, 
0.8262061, 0.5972222, 0.04166667, 0.08333333, 0.04166667, 0.125
), week = c(27, 27, 27, 27, 27, 27, 29, 29, 29, 30, 27, 27, 27, 
27, 27, 27, 29, 29, 29, 30)), .Names = c("Climate_Division", 
"Date", "Rain", "week"), row.names = c(NA, 20L), class = "data.frame")

# Coerce to Date class
CADaily$Date <- as.Date(x=CADaily$Date, format='%d/%m/%Y')

# Extract day of the week (Saturday = 6)
CADaily$Week_Day <- as.numeric(format(CADaily$Date, format='%w'))

# Adjust end-of-week date (first saturday from the original Date)
CADaily$End_of_Week <- CADaily$Date + (6 - CADaily$Week_Day)

# Aggregate over week and climate division
aggregate(Rain~End_of_Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   End_of_Week Climate_Division       Rain
# 1  1948-07-03                1 1.52777780
# 2  1948-07-10                1 0.61799463
# 3  1948-07-17                1 0.04166667
# 4  1948-07-24                1 0.08333333
# 5  1948-07-03                2 1.52777780
# 6  1948-07-10                2 0.61799463
# 7  1948-07-17                2 0.04166667
# 8  1948-07-24                2 0.08333333

Additional operations

Also, using this code you can obtain results from additional aggregation functions, assuming the result is an atomic vector of the same length for every week-division pair.

# Aggregate over week and climate division, and show the total number of
# observations per week, the number of observations which represent missing
# values, the average, and the standard deviation.
aggregate(Rain~End_of_Week+Climate_Division, data=CADaily,
          FUN=function(x) c(n=length(x),
                            NAs=sum(is.na(x)),
                            Average=mean(x, na.rm=TRUE),
                            SD=sd(x, na.rm=TRUE)))

# Output. You get NA for the standard deviation if there is only one observation.
#   End_of_Week Climate_Division     Rain.n   Rain.NAs Rain.Average    Rain.SD
# 1  1948-07-03                1 3.00000000 0.00000000   1.52777780 1.20353454
# 2  1948-07-10                1 3.00000000 0.00000000   0.61799463 0.19864151
# 3  1948-07-17                1 1.00000000 0.00000000   0.04166667         NA
# 4  1948-07-24                1 3.00000000 0.00000000   0.08333333 0.04166667
# 5  1948-07-03                2 3.00000000 0.00000000   1.52777780 1.20353454
# 6  1948-07-10                2 3.00000000 0.00000000   0.61799463 0.19864151
# 7  1948-07-17                2 1.00000000 0.00000000   0.04166667         NA
# 8  1948-07-24                2 3.00000000 0.00000000   0.08333333 0.04166667



Original answer

Try with the lubridate package. Load it, and then aggregate (kept for the record as part of the original answer, which reflected the OP's request to aggregate by week).

# Load lubridate package
library(package=lubridate)

# Set Weeks number. Date already of class `Date`
CADaily$Week <- week(CADaily$Date)

# Aggregate over week number and climate division
aggregate(Rain~Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)

# Output
#   Week Climate_Division       Rain
# 1   27                1 1.07288622
# 2   29                1 0.05555556
# 3   30                1 0.12500000
# 4   27                2 1.07288622
# 5   29                2 0.05555556
# 6   30                2 0.12500000

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...