r - Blend of na.omit and na.pass using aggregate?

Question

Welcome To Ask or Share your Answers For Others

r - Blend of na.omit and na.pass using aggregate?

posted Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

r - Blend of na.omit and na.pass using aggregate?

I have a data set containing product prototype test data. Not all tests were run on all lots, and not all tests were executed with the same sample sizes. To illustrate, consider this case:

> test <- data.frame(name = rep(c("A", "B", "C"), each = 4),
  var1 = rep(c(1:3, NA), 3),
  var2 = 1:12,
  var3 = c(rep(NA, 4), 1:8))

> test
   name var1 var2 var3
1     A    1    1   NA
2     A    2    2   NA
3     A    3    3   NA
4     A   NA    4   NA
5     B    1    5    1
6     B    2    6    2
7     B    3    7    3
8     B   NA    8    4
9     C    1    9    5
10    C    2   10    6
11    C    3   11    7
12    C   NA   12    8

In the past, I've only had to deal with cases of mis-matched repetitions, which has been easy with aggregate(cbind(var1, var2) ~ name, test, FUN = mean, na.action = na.omit) (or the default setting). I'll get averages for each lot over three values for var1 and over four values for var2.

Unfortunately, this will leave me with a dataset completely missing lot A in this case:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.omit)
  name var1 var2 var3
1    B    2    6    2
2    C    2   10    6

If I use na.pass, however, I also don't get what I want:

 aggregate(cbind(var1, var2, var3) ~ name, test, FUN = mean, na.action = na.pass)
  name var1 var2 var3
1    A   NA  2.5   NA
2    B   NA  6.5  2.5
3    C   NA 10.5  6.5

Now I lose the good data I had in var1 since it contained instances of NA.

What I'd like is:

NA as the output of mean() if all unique combinations of varN ~ name are NAs
Output of mean() if there are one or more actual values for varN ~ name

I'm guessing this is pretty simple, but I just don't know how. Do I need to use ddply for something like this? If so... the reason I tend to avoid it is that I end up writing really long equivalents to aggregate() like so:

ddply(test, .(name), summarise,
  var1 = mean(var1, na.rm = T),
  var2 = mean(var2, na.rm = T),
  var3 = mean(var3, na.rm = T))

Yeah... so the result of that apparently does what I want. I'll leave the question anyway in case there's 1) a way to do this with aggregate() or 2) shorter syntax for ddply.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-17T03:05:36+0000

Pass both na.action=na.pass and na.rm=TRUE to aggregate. The former tells aggregate not to delete rows where NAs exist; and the latter tells mean to ignore them.

aggregate(cbind(var1, var2, var3) ~ name, test, mean,
          na.action=na.pass, na.rm=TRUE)

Categories

r - Blend of na.omit and na.pass using aggregate?

r - Blend of na.omit and na.pass using aggregate?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags