Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
567 views
in Technique[技术] by (71.8m points)

c - Understanding undefined behavior for a binary stream using fseek(file, 0, SEEK_END) with a file

The C spec has an interesting footnote (#268 C11dr §7.21.3 9)

"Setting the file position indicator to end-of-file, as with fseek(file, 0, SEEK_END), has undefined behavior for a binary stream (because of possible trailing null characters) or for any stream with state-dependent encoding that does not assuredly end in the initial shift state."

Does this ever apply to binary streams reading a file? (as from a physical device)

IMO, a binary file on a disk is just a sea of bytes. It seems to me that a binary file could not have state-dependent encoding as it is a binary file. I'm fuzzy on the concept of "binary wide-oriented streams" and if that even could apply to disk I/O.

I see that calling fseek(file, 0, SEEK_END) on a serial stream like a com port or maybe stdin may not get to the true end as the end is yet to be determined. Thus the narrowing of the question to physical files.


[edit] Answer: A concern with older (maybe up to late 1980s). Presently in 2014, Windows, POSIT-specific and non-exotic others: not a problem.

@Shafik Yaghmour provides a good reference in Using fseek and ftell to determine the size of a file has a vulnerability?. There @Jerry Coffin discusses CP/M as binary files not always having a precise length. (128-byte records per wiki).

Thanks to @Keith Thompson answer for the meat of the answer.

Together this explains the specs's "(because of possible trailing null characters)" comment.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Binary files are going to be sequences of 8-bit bytes, with an exact specified size, on any system you're likely to use. But not all systems store files that way, and the C standard is carefully designed to allow portability to systems with unusual characteristics.

For example, a conforming C implementation might run on an operating system that stores files as sequences of 512-byte blocks, with no indication of how many bytes of the final block are significant. On such a system, when a binary file is created, the OS might pad the remainder of the final block with zero bytes. When you read from such a file, the padding bytes might either appear in the input (even though they were never explicitly written to the file), or they might be ignored (even though the program that created the file might have written them explicitly).

If you're reading from a non-seekable stream (for example keyboard input), then fseek(file, 0, SEEK_END) won't just give you a bad result, it will indicate failure by returning a non-zero result. (On POSIX-compliant systems, it returns -1 and sets errno; ISO C doesn't require that.)

On most systems, fseek(file, 0, SEEK_END) on a binary file will either seek to the actual end of the file (a position determined by exactly how many bytes were written to the file), or return a clear failure indication. If you're using POSIX-specific features anyway, you can safely assume this behavior; you can probably make the same assumption for Windows and a number of other systems. If you want your code to be 100% portable to exotic systems, you shouldn't assume that binary files won't be padded with extra zero bytes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...