Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
747 views
in Technique[技术] by (71.8m points)

r - Add mean columns to dataframe by category with NA's

I'm trying to add a column of the means to my data-frame with some code I can run over many different columns even if they include NA's (which I think excludes many possibilities).

The best I could do is:

TestData <- data.frame(geo=c(rep("AT",4),rep("DE",4)),time=c(rep(c(1990:1993),2)),value=c(NA,4,20,6,NA,NA,5,3))

mean <- aggregate(value~geo, TestData, mean)

which calculates the correct means by category (geo). How can I rightjoin them to the dataframe so that the mean is not just one observation but shows up at every time point? I'm thinking about ddply but can't get it to work properly. The data frame I'm looking for is:

    geo time    value   mean   (or optionally, no problem for me)
1   AT  1990    NA     10      NA
2   AT  1991    4      10      10
3   AT  1992    20     10      10
4   AT  1993    6      10      10
5   DE  1990    NA     4       NA
6   DE  1991    NA     4       NA
7   DE  1992    5      4       4
8   DE  1993    3      4       4

Any help would be appreciated!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Try:

 testData1 <-  within(TestData, {
                    Mean <- ave(value, geo, FUN=function(x) mean(x, na.rm=TRUE))
                    Mean[is.na(value)] <- NA}) #If you don't want `NA` values don't use this step


  testData1
  # geo time value Mean
 #1  AT 1990    NA   NA
 #2  AT 1991     4   10
 #3  AT 1992    20   10
 #4  AT 1993     6   10
 #5  DE 1990    NA   NA
 #6  DE 1991    NA   NA
 #7  DE 1992     5    4
 #8  DE 1993     3    4

If you want to find mean of multiple columns that starts with name value:

For example:

 TestData1 <- TestData
 TestData1$value2 <- c(4, NA, 25, NA, NA, 10,5, 2)


 library(dplyr)

 res <- left_join(TestData1,
              TestData1 %>% 
                        group_by(geo) %>%
                        mutate_each(funs(mean=mean(., na.rm=TRUE)), starts_with("value")), 
                          by=c("geo", "time"))


 colnames(res) <- gsub("\.y$", ".mean", colnames(res))
 res
 #  geo time value.x value2.x value.mean value2.mean
 #1  AT 1990      NA        4         10   14.500000
 #2  AT 1991       4       NA         10   14.500000
 #3  AT 1992      20       25         10   14.500000
 #4  AT 1993       6       NA         10   14.500000
 #5  DE 1990      NA       NA          4    5.666667
 #6  DE 1991      NA       10          4    5.666667
 #7  DE 1992       5        5          4    5.666667
 #8  DE 1993       3        2          4    5.666667

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...