Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.0k views
in Technique[技术] by (71.8m points)

linux - Splitting gzipped logfiles without storing the ungzipped splits on disk

I have a recurring task of splitting a set of large (about 1-2 GiB each) gzipped Apache logfiles into several parts (say chunks of 500K lines). The final files should be gzipped again to limit the disk usage.

On Linux I would normally do:

zcat biglogfile.gz | split -l500000

The resulting files files will be named xaa, xab, xac, etc So I do:

gzip x*

The effect of this method is that as an intermediate result these huge files are temporarily stored on disk. Is there a way to avoid this intermediate disk usage?

Can I (in a way similar to what xargs does) have split pipe the output through a command (like gzip) and recompress the output on the fly? Or am I looking in the wrong direction and is there a much better way to do this?

Thanks.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can use the split --filter option as explained in the manual e.g.

zcat biglogfile.gz | split -l500000 --filter='gzip > $FILE.gz'

Edit: not aware when --filter option was introduced but according to comments, it is not working in core utils 8.4.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...