Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
141 views
in Technique[技术] by (71.8m points)

r - Aggregate a dataframe on a given column and display another column

I have a dataframe in R of the following form:

> head(data)
  Group Score Info
1     1     1    a
2     1     2    b
3     1     3    c
4     2     4    d
5     2     3    e
6     2     1    f

I would like to aggregate it following the Score column using the max function

> aggregate(data$Score, list(data$Group), max)

  Group.1         x
1       1         3
2       2         4

But I also would like to display the Info column associated to the maximum value of the Score column for each group. I have no idea how to do this. My desired output would be:

  Group.1         x        y
1       1         3        c
2       2         4        d

Any hint?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

A base R solution is to combine the output of aggregate() with a merge() step. I find the formula interface to aggregate() a little more useful than the standard interface, partly because the names on the output are nicer, so I'll use that:

The aggregate() step is

maxs <- aggregate(Score ~ Group, data = dat, FUN = max)

and the merge() step is simply

merge(maxs, dat)

This gives us the desired output:

R> maxs <- aggregate(Score ~ Group, data = dat, FUN = max)
R> merge(maxs, dat)
  Group Score Info
1     1     3    c
2     2     4    d

You could, of course, stick this into a one-liner (the intermediary step was more for exposition):

merge(aggregate(Score ~ Group, data = dat, FUN = max), dat)

The main reason I used the formula interface is that it returns a data frame with the correct names for the merge step; these are the names of the columns from the original data set dat. We need to have the output of aggregate() have the correct names so that merge() knows which columns in the original and aggregated data frames match.

The standard interface gives odd names, whichever way you call it:

R> aggregate(dat$Score, list(dat$Group), max)
  Group.1 x
1       1 3
2       2 4
R> with(dat, aggregate(Score, list(Group), max))
  Group.1 x
1       1 3
2       2 4

We can use merge() on those outputs, but we need to do more work telling R which columns match up.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...