I already loaded 20 csv files with function:
tbl = list.files(pattern="*.csv")
for (i in 1:length(tbl)) assign(tbl[i], read.csv(tbl[i]))
or
list_of_data = lapply(tbl, read.csv)
That how it looks like:
> head(tbl)
[1] "F1.csv" "F10_noS3.csv" "F11.csv" "F12.csv" "F12_noS7_S8.csv"
[6] "F13.csv"
I have to combine all of those files into one. Let's call it a master file but let's try with making a one table with all of the names.
In all of those csv files is a column called "Accession". I would like to make a table of all "names" from all of those csv files. Of course many of the accessions can be repeated in different csv files. I would like to keep all of the data corresponding to the accession.
Some problems:
- Some of those "names" are the same and I don't want to duplicate them
- Some of those "names" are ALMOST the same. The difference is that there is name and after become the dot and the numer.
- The number of columns can be different is those csv files.
That's the screenshot showing how those data looks like:
http://imageshack.com/a/img811/7103/29hg.jpg
Let me show you how it looks:
AT3G26450.1 <--
AT5G44520.2
AT4G24770.1
AT2G37220.2
AT3G02520.1
AT5G05270.1
AT1G32060.1
AT3G52380.1
AT2G43910.2
AT2G19760.1
AT3G26450.2 <--
<--
= Same sample, different names. Should be treated as one. So just ignore dot and a number after.
Is it possible to do ?
I couldn't do a dput(head)
because it's even too big data set.
I tried to use such code:
all_data = do.call(rbind, list_of_data)
Error in rbind(deparse.level, ...) :
The number of columns is not correct.
all_data$CleanedAccession = str_extract(all_data$Accession, "^[[:alnum:]]+")
all_data = subset(all_data, !duplicated(CleanedAccession))
I tried to do it for almost 2 weeks and I am not able to. So please help me.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…