Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
724 views
in Technique[技术] by (71.8m points)

r - Reading aligned column data with fread

I came across a file like this:

COL1        COL2          COL3
weqw        asrg          qerhqetjw
weweg       ethweth       rqerhwrtjw
rhqerhqerhq qergqer       qerhqew5h
qerh        qergqer       wetjwryerj

I could not load it directly with fread so I replaced s+ by , with sed than I gave to fread and it solved it. But is there a built in way of reading this kind of data with data.table ?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

fread does not (yet) have any capabilities for reading fixed-width files.

I, too, often come across files annoyingly stored like this. Feel free to add a feature request on the Github page.

It may not be so in your case, but your solution with sed would not work on a lot of FWF I come across because there's no space between columns, e.g. you'll see strings like 00010 that actually comprise 3 fields.

If that's the case, you'll need a field width dictionary, at which point you have several options:

  1. read.fwf within R
  2. Write a fwf->csv program (I use one I wrote in Python and it's pretty fast, could share the code if you'd like)--basically the beefed up version of your initial approach, so that you never have to deal with the FWF again
  3. Open it in Excel / LibreOffice / etc; there's a native FWF reader that tries (usually poorly) to guess the widths of the columns, which at least does half the work of specifying the column widths for you. Then you can save it as .csv or whatever from there.

I personally stick with the second option most often. read.fwf is not optimized like fread so it will probably be slow. And if you've got a lot (say 20+) of FWF to read, the 3rd option is pretty tedious.

But I agree it would be nice to have something like this built in to fread.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...