Read odd and even rows separately, using sed, then fread with column bind, this will get you to "expected output", it is pretty fast, too, around 2 seconds with unzipped input:
# get the data
# wget
# unzip on the fly = proc.time()
d <- cbind(
fread(cmd = "zcat pdb_seqres.txt.gz | sed -n 'p;n'", sep = "|"),
fread(cmd = "zcat pdb_seqres.txt.gz | sed -n 'n;p'"))
cat("Finished in", timetaken(, "
# Finished in 4.585s elapsed (1.788s cpu)
# read unzipped input = proc.time()
d <- cbind(
fread(cmd = "sed -n 'p;n' pdb_seqres.txt", sep = "|"),
fread(cmd = "sed -n 'n;p' pdb_seqres.txt"))
cat("Finished in", timetaken(, "
# Finished in 1.796s elapsed (1.111s cpu)
In theory below should work, i.e. we are column binding using bash paste before freading, but it keeps giving me errors about tempfile permissions, might work on your set up.
fread(cmd = "paste -d'|' <(sed -n 'p;n' pdb_seqres.txt) <(sed -n 'n;p' pdb_seqres.txt)",
sep = "|")