Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
292 views
in Technique[技术] by (71.8m points)

r - Fast reading and combining several files using data.table (with fread)

I have several different txt files with the same structure. Now I want to read them into R using fread, and then union them into a bigger dataset.

## First put all file names into a list 
library(data.table)
all.files <- list.files(path = "C:/Users",pattern = ".txt")

## Read data using fread
readdata <- function(fn){
    dt_temp <- fread(fn, sep=",")
    keycols <- c("ID", "date")
    setkeyv(dt_temp,keycols)  # Notice there's a "v" after setkey with multiple keys
    return(dt_temp)

}
# then using 
mylist <- lapply(all.files, readdata)
mydata <- do.call('rbind',mylist)

The code works fine, but the speed is not satisfactory. Each txt file has 1M observations and 12 fields.

If I use the fread to read a single file, it's fast. But using apply, then speed is extremely slow, and obviously take much time than reading files one by one. I wonder where went wrong here, is there any improvements for the speed gain?

I tried the llply in plyr package, there're not much speed gains.

Also, is there any syntax in data.table to achieve vertical join like rbind and union in sql?

Thanks.

question from:https://stackoverflow.com/questions/21156271/fast-reading-and-combining-several-files-using-data-table-with-fread

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Use rbindlist() which is designed to rbind a list of data.table's together...

mylist <- lapply(all.files, readdata)
mydata <- rbindlist( mylist )

And as @Roland says, do not set the key in each iteration of your function!

So in summary, this is best :

l <- lapply(all.files, fread, sep=",")
dt <- rbindlist( l )
setkey( dt , ID, date )

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...