Is there a well-hidden way to read tokens from a file or file-like object without reading entire lines? The application I immediately have (someone else's problem, not mine) is transposing a large matrix with a few very long rows, essentially performing an itertools.izip()
on iterators that pick out the elements of a single column. The idea is not not have the entire file in memory during iteration.
The rows are space-delimited ASCII decimal numbers.
The problem would be simple with Java's Scanner class, but I don't see anything in the Python Standard Library that appears to tokenize without having the whole input in a string.
For the record, I know how to write this on my own. I'm just wondering if there's a standard tool that I missed. Something FOSS/libre that can be EasyInstalled is good, too, but I don't see anything on PYPI either.
The full problem was to take the sample input:
"123 3 234234 -35434 112312 54 -439 99 0 42
" +
"13 456 -78 910 333 -44 5555 6 8"
...and produce the output (as a generator, without reading all of very long rows into memory at once:
[123, 13], [3, 456], [234234, -78], ...etc
As I said, it's essentially itertools.izip(iterator1, iterator2), pointing iterator1 at the start of the file, and iterator2 just past the newline to read the second row.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…