Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
238 views
in Technique[技术] by (71.8m points)

Calculate Mean of Multiply Columns with Condition in R

I want to calculate mean of several variables but with condition, if 2 of those columns have NA, mean will be NA, if less than 2, find mean

df <- data.frame(ID = c(1:10),X1 = c(rep(1,5),rep(2,5)),X2 = c(1:10),X3 =   c(1,NA,2,NA,NA,1,NA,2,NA,NA),X4 = c(rep(NA,10)),X5=c(rep(1,5),rep(NA,5)),
             Y1 = c(rep(1,5),rep(2,5)),Y2 = c(1:10),Y3 = c(1,NA,2,NA,NA,1,NA,2,NA,NA),Y4 = c(rep(NA,10)),Y5=c(rep(1,5),rep(NA,5)))

MeanX = round(apply(df[,c(2:6)],1, mean,na.rm = TRUE),2)
MeanY = round(apply(df[,c(7:11)],1,mean,na.rm = TRUE),2)

This is output it's incorrect

   ID X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5 MeanX MeanY
1   1  1  1  1 NA  1  1  1  1 NA  1  1.00  1.00
2   2  1  2 NA NA  1  1  2 NA NA  1  1.33  1.33*
3   3  1  3  2 NA  1  1  3  2 NA  1  1.75  1.75
4   4  1  4 NA NA  1  1  4 NA NA  1  2.00  2.00*
5   5  1  5 NA NA  1  1  5 NA NA  1  2.33  2.33*
6   6  2  6  1 NA NA  2  6  1 NA NA  3.00  3.00*
7   7  2  7 NA NA NA  2  7 NA NA NA  4.50  4.50 *
8   8  2  8  2 NA NA  2  8  2 NA NA  4.00  4.00 *
9   9  2  9 NA NA NA  2  9 NA NA NA  5.50  5.50 *
10 10  2 10 NA NA NA  2 10 NA NA NA  6.00  6.00 * This is supposed NA,bc there are 3 columns have NA

Because I have a large dataset, for each group sometimes I have to set 6 out of 20,sometimes 1 out of 10, so I can calculate mean, how I can set condition for this case.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Here is a VERY quick (have to run) and dirty solution with data.table. But I believe it can be cleaned and built upon to make something that is neat and works well.

# Load data.table
require(data.table)
setDT(df)

# Format all columns as as numeric, 
# otherwise mean is not meaningful (see what I did there?)
x.cols <- paste("X", 1:5, sep = "")
y.cols <- paste("Y", 1:5, sep = "")
setDT(df)[, (x.cols) := lapply(.SD, as.integer), .SDcols = x.cols]
setDT(df)[, (y.cols) := lapply(.SD, as.integer), .SDcols = y.cols]

# meanX first mean, and then NA
df[, meanX := mean(c(X1, X2, X3, X4, X5), na.rm = TRUE), by =ID]
df[df[, sum(is.na(c(X1, X2, X3, X4, X5))) > 2, by = ID]$V1, meanX := NA]

# meanY first mean, and then NA
df[, meanY := mean(c(Y1, Y2, Y3, Y4, Y5), na.rm = TRUE), by =ID]
df[df[, sum(is.na(c(Y1, Y2, Y3, Y4, Y5))) > 2, by = ID]$V1, meanY := NA]

# Result
df

    ID X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5    meanX    meanY
 1:  1  1  1  1 NA  1  1  1  1 NA  1 1.000000 1.000000
 2:  2  1  2 NA NA  1  1  2 NA NA  1 1.333333 1.333333
 3:  3  1  3  2 NA  1  1  3  2 NA  1 1.750000 1.750000
 4:  4  1  4 NA NA  1  1  4 NA NA  1 2.000000 2.000000
 5:  5  1  5 NA NA  1  1  5 NA NA  1 2.333333 2.333333
 6:  6  2  6  1 NA NA  2  6  1 NA NA 3.000000 3.000000
 7:  7  2  7 NA NA NA  2  7 NA NA NA       NA       NA
 8:  8  2  8  2 NA NA  2  8  2 NA NA 4.000000 4.000000
 9:  9  2  9 NA NA NA  2  9 NA NA NA       NA       NA
10: 10  2 10 NA NA NA  2 10 NA NA NA       NA       NA

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...