Long time lurker on the forum, this will be my first post. I appreciate your patience in advance, I have limited formal training in computer science and am definitely a biologist by day.
My question is regarding how to handle processing two lists with multiple dataframes each in R. Please find example data below.
set.seed(1)
set1 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
SYMBOL = paste(c(sample(LETTERS, 10))),
SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set2 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
SYMBOL = paste(c(sample(LETTERS, 10))),
SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set3 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
SYMBOL = paste(c(sample(LETTERS, 10))),
SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
set4 <- data.frame(NAME = paste("row_", 1:10, sep = ""),
SYMBOL = paste(c(sample(LETTERS, 10))),
SIGNIFICANT = sample(c("yes", "no"), 10, replace = TRUE))
files <- list(set1, set2, set3, set4)
names(files) <- paste("Set", 1:4, sep = "")
reports <- list(data.frame(SETS = c("Set1", "Set3"),
STATISTIC = runif(2)),
data.frame(SETS = c("Set2", "Set4"),
STATISTIC = runif(2)))
names(reports) <- c("Report1", "Report2")
files
is a list containing many dataframes of metadata from an analysis.
> files$Set1
NAME SYMBOL SIGNIFICANT
1 row_1 Y no
2 row_2 D no
3 row_3 G no
4 row_4 A yes
5 row_5 B yes
6 row_6 K yes
7 row_7 N yes
8 row_8 R yes
9 row_9 W yes
10 row_10 J yes
reports
is also a list containing 2 dataframes with primary outputs from a two-way analysis and associated statistics.
> reports$Report1
SETS STATISTIC
1 Set1 0.4100841
2 Set3 0.8108702
Note that the names of the dataframes within the files
list correspond with column 2 of the dataframes within the reports
list.
I wish to collapse these files
metadata in a particular way. If files$Set1$SIGNIFICANT == 'yes'
, I would like to append the corresponding SYMBOL
to a comma delimited string. Then, I would like to append the string to the corresponding Set within reports
. Thus, my desired output would be as follows:
> head(reports$Report1)
SETS STATISTIC SYMBOL
1 Set1 0.4100841 A, V, K, N, R, W, J
2 Set3 0.8108702 F, S, J, V
and likewise for Report2
Easy enough to do manually for this example, but in my actual project, length(files)=600
I am attempting to parse this through a for
loop but keep running into errors. Here is my current iteration
output <- data.frame()
for(i in 1:length(files)){
for(j in 1:nrow(files[[i]])){
if(files[j, 3] == "Yes"){
output[i, 1]=i;
output[i, 2]=paste0(i[,2], collapse = ", ")
}
}
}
And my current error:
Error in i[[j, 3]] : incorrect number of subscripts
I have been working with R for ~4 years now and if I know one thing, its that people avoid loops like the plague more often than not. I know some variation of apply
, lapply
, etc. is likely going to make life easy. Despite that, after consulting the R literature and this forum, I am stumped.
Would appreciate some advice on this one. Thanks everybody!