Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
654 views
in Technique[技术] by (71.8m points)

bash zcat head causes pipefail?

set -eu 
VAR=$(zcat file.gz  |  head -n 12)

works fine

set -eu   -o pipefail
VAR=$(zcat file.gz  |  head -n 12)

causes bash to exit with failure. How is this causing a pipefail?

Note that file.gz contains millions of lines (~ 750 MB, compressed).

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Think about it, for a moment.

  1. You're telling the shell that your entire pipeline should be considered to have failed if any component failed.
  2. You're telling zcat to write its output to head.
  3. Then you're telling head to exit after reading 12 lines, out of a much-longer-than-12-line input stream.

Of course you have an error: zcat has its destination pipeline closed early, and wasn't able to successfully write a decompressed version of your input file! It doesn't have any way of knowing that this was due to user intent, via something erroneous happening.

If you were using zcat to write to a disk and it ran out of space, or to a network stream and there was a connection loss, it would be entirely correct and appropriate for it to exit with a status indicating a failure. This is simply another case of that rule.


The specific error which zcat is being given by the operating system is EPIPE, returned by the write syscall under the following condition: An attempt is made to write to a pipe that is not open for reading by any process.

After head (the only reader of this FIFO) has exited, for any write to the input side of pipeline not to return EPIPE would be a bug. For zcat to silently ignore an error writing its output, and thus be able to generate an inaccurate output stream without an exit status reflecting this event, would likewise be a bug.


If you don't want to change any of your shell options, by the way, one workaround you might consider is using process substitution:

var=$(head -n 12 < <(zcat file.gz))

In this case, zcat is not a pipeline component, and its exit status is not considered for purposes of determining success. (You might test whether $var is 12 lines long, if you want to come up with an independent success/fail determination).


A more comprehensive solution could be implemented by pulling in a Python interpreter, with its native gzip support. A native Python implementation (compatible with both Python 2 and 3.x), embedded in a shell script, might look something like:

zhead_py=$(cat <<'EOF'
import sys, gzip
gzf = gzip.GzipFile(sys.argv[1], 'rb')
outFile = sys.stdout.buffer if hasattr(sys.stdout, 'buffer') else sys.stdout
numLines = 0
maxLines = int(sys.argv[2])
for line in gzf:
    if numLines >= maxLines:
        sys.exit(0)
    outFile.write(line)
    numLines += 1
EOF
)
zhead() { python -c "$zhead_py" "$@"; }

...which gets you a zhead that doesn't fail if it runs out of input data, but does pass through a failed exit status for genuine I/O failures or other unexpected events. (Usage is of the form zhead in.gz 5, to read 5 lines from in.gz).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...