Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
232 views
in Technique[技术] by (71.8m points)

loops - Using R's plyr package to reorder groups within a dataframe

I have a data reorganization task that I think could be handled by R's plyr package. I have a dataframe with numeric data organized in groups. Within each group I need to have the data sorted largest to smallest.

The data looks like this (code to generate below)

group     value
2     b 0.1408790
6     b 1.1450040   #2nd b is smaller than 1st
1     c 5.7433568
3     c 2.2109819
4     d 0.5384659
5     d 4.5382979

What I would like is this.

group     value
b 1.1450040  #1st b is largest
b 0.1408790
c 5.7433568
c 2.2109819
d 4.5382979
d 0.5384659

So, what I need plyr to do is go through each group & apply something like order on the numeric data, reorganize by order, save the reordered subset of data, & put it all back together at the end.

I can process this "by hand" with a list & some loops, but it takes a long long time. Can this be done by plyr in a couple of lines?

Example data

df.sz <-  6;groups <-c("a","b","c","d")
df <- data.frame(group = sample(groups,df.sz,replace = TRUE),
value = runif(df.sz,0,10),stringsAsFactors = FALSE)
df <- df[order(df$group),] #order by group letter

The inefficient approach using loops:

My current approach is to separate the dataframe df into a list by groups, apply order to each element of the list, and overwrite the original list element with the reordered element. I then use a loop to re-assemble the dataframe. (As a learning exercise, I'd interested also in how to make this code more efficient. In particular, what would be the most efficient way using base R functions to turn a list into a dataframe?)

Vector of the unique groups in the dataframe

groups.u <- unique(df$group)

Create empty list

my.list <- as.list(groups.u); names(my.list) <- groups.u

Break up df by $group into list

for(i in 1:length(groups.u)){
  i.working <- which(df$group == groups.u[i]) 
  my.list[[i]] <- df[i.working, ]
}

Sort elements within list using order

for(i in 1:length(my.list)){
  order.x <- order(my.list[[i]]$value,na.last = TRUE, decreasing = TRUE)
  my.list[[i]] <- my.list[[i]][order.x, ] 
}

Finally rebuild df from the list. 1st, make seed for loop

new.df <- my.list[[1]][1,];; new.df[1,] <- NA
for(i in 1:length(my.list)){
  new.df <- rbind(new.df,my.list[[i]])
}

Remove seed

new.df <- new.df[-1,]
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could use dplyr which is a newer version of plyr that focuses on data frames:

library(dplyr)
arrange(df, group, desc(value))

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...