Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
144 views
in Technique[技术] by (71.8m points)

Collapsing one list of dataframes and combining with another list of dataframes in R

Long time lurker on the forum, this will be my first post. I appreciate your patience in advance, I have limited formal training in computer science and am definitely a biologist by day.

My question is regarding how to handle processing two lists with multiple dataframes each in R. Please find example data below.

set.seed(1)
set1 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
                 SYMBOL = paste(c(sample(LETTERS, 10))),
                 SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set2 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
               SYMBOL = paste(c(sample(LETTERS, 10))),
               SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set3 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
                  SYMBOL = paste(c(sample(LETTERS, 10))),
                  SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set4 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
                 SYMBOL = paste(c(sample(LETTERS, 10))),
                 SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
files <- list(set1, set2, set3, set4)
names(files) <- paste("Set", 1:4, sep = "")
reports <- list(data.frame(SETS = c("Set1", "Set3"),
                        STATISTIC = runif(2)),
             data.frame(SETS = c("Set2", "Set4"),
                        STATISTIC = runif(2)))
names(reports) <- c("Report1", "Report2")

files is a list containing many dataframes of metadata from an analysis.

> files$Set1
     NAME SYMBOL SIGNIFICANT
1   row_1      Y          no
2   row_2      D          no
3   row_3      G          no
4   row_4      A         yes
5   row_5      B         yes
6   row_6      K         yes
7   row_7      N         yes
8   row_8      R         yes
9   row_9      W         yes
10 row_10      J         yes

reports is also a list containing 2 dataframes with primary outputs from a two-way analysis and associated statistics.

> reports$Report1
  SETS STATISTIC
1 Set1 0.4100841
2 Set3 0.8108702

Note that the names of the dataframes within the files list correspond with column 2 of the dataframes within the reports list.

I wish to collapse these files metadata in a particular way. If files$Set1$SIGNIFICANT == 'yes', I would like to append the corresponding SYMBOL to a comma delimited string. Then, I would like to append the string to the corresponding Set within reports. Thus, my desired output would be as follows:

> head(reports$Report1)
  SETS STATISTIC              SYMBOL
1 Set1 0.4100841 A, V, K, N, R, W, J
2 Set3 0.8108702          F, S, J, V

and likewise for Report2

Easy enough to do manually for this example, but in my actual project, length(files)=600

I am attempting to parse this through a for loop but keep running into errors. Here is my current iteration

output <- data.frame()
for(i in 1:length(files)){
  for(j in 1:nrow(files[[i]])){
    if(files[j, 3] == "Yes"){
      output[i, 1]=i;
      output[i, 2]=paste0(i[,2], collapse = ", ")
    }
  }
}

And my current error:

Error in i[[j, 3]] : incorrect number of subscripts

I have been working with R for ~4 years now and if I know one thing, its that people avoid loops like the plague more often than not. I know some variation of apply, lapply, etc. is likely going to make life easy. Despite that, after consulting the R literature and this forum, I am stumped.

Would appreciate some advice on this one. Thanks everybody!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use sapply to iterate over files list from each dataframe keep only SIGNIFICANT = 'yes' values and collapse them into one string.

data <- stack(sapply(files,function(x) toString(x$SYMBOL[x$SIGNIFICANT=='yes'])))

data
#               values  ind
#1 A, B, K, N, R, W, J Set1
#2                B, F Set2
#3          F, S, J, V Set3
#4    W, Z, H, Q, D, M Set4

You can then merge data with each dataframe in reports.

result <- lapply(reports, function(x) merge(x,data, by.x = 'SETS', by.y = 'ind'))
result

#$Report1
#  SETS STATISTIC              values
#1 Set1 0.4100841 A, B, K, N, R, W, J
#2 Set3 0.8108702          F, S, J, V

#$Report2
#  SETS STATISTIC           values
#1 Set2 0.6049333             B, F
#2 Set4 0.6547239 W, Z, H, Q, D, M

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...