Read odd and even rows separately, using sed, then fread with column bind, this will get you to "expected output", it is pretty fast, too, around 2 seconds with unzipped input:
# get the data
# wget ftp://ftp.wwpdb.org/pub/pdb/derived_data/pdb_seqres.txt.gz
library(data.table)
# unzip on the fly
started.at = proc.time()
d <- cbind(
fread(cmd = "zcat pdb_seqres.txt.gz | sed -n 'p;n'", sep = "|"),
fread(cmd = "zcat pdb_seqres.txt.gz | sed -n 'n;p'"))
cat("Finished in", timetaken(started.at), "
")
# Finished in 4.585s elapsed (1.788s cpu)
# read unzipped input
started.at = proc.time()
d <- cbind(
fread(cmd = "sed -n 'p;n' pdb_seqres.txt", sep = "|"),
fread(cmd = "sed -n 'n;p' pdb_seqres.txt"))
cat("Finished in", timetaken(started.at), "
")
# Finished in 1.796s elapsed (1.111s cpu)
In theory below should work, i.e. we are column binding using bash paste before freading, but it keeps giving me errors about tempfile permissions, might work on your set up.
fread(cmd = "paste -d'|' <(sed -n 'p;n' pdb_seqres.txt) <(sed -n 'n;p' pdb_seqres.txt)",
sep = "|")
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…