Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
761 views
in Technique[技术] by (71.8m points)

c - Is it possible to confuse EOF with a normal byte value when using fgetc?

We often use fgetc like this:

int c;
while ((c = fgetc(file)) != EOF)
{
    // do stuff
}

Theoretically, if a byte in the file has the value of EOF, this code is buggy - it will break the loop early and fail to process the whole file. Is this situation possible?

As far as I understand, fgetc internally casts a byte read from the file to unsigned char and then to int, and returns it. This will work if the range of int is greater than that of unsigned char.

What happens if it's not (probably then sizeof(int)=1)?

  • Will fgetc read a legitimate data equal to EOF from a file sometimes?
  • Will it alter the data it read from the file to avoid the single value EOF?
  • Will fgetc be an unimplemented function?
  • Will EOF be of another type, like long?

I could make my code fool-proof by an extra check:

int c;
for (;;)
{
    c = fgetc(file);
    if (feof(file))
        break;
    // do stuff
}

It is necessary if I want maximum portability?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The C specification says that int must be able to hold values from -32767 to 32767 at a minimum. Any platform with a smaller int is nonstandard.

The C specification also says that EOF is a negative int constant and that fgetc returns "an unsigned char converted to an int" in the event of a successful read. Since unsigned char can't have a negative value, the value of EOF can be distinguished from anything read from the stream.*

*See below for a loophole case in which this fails to hold.


Relevant standard text (from C99):

  • §5.2.4.2.1 Sizes of integer types <limits.h>:

    [The] implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

    [...]

    • minimum value for an object of type int

      INT_MIN -32767

    • maximum value for an object of type int

      INT_MAX +32767

  • §7.19.1 <stdio.h> - Introduction

    EOF ... expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream

  • §7.19.7.1 The fgets function

    If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined)

If UCHAR_MAXINT_MAX, there is no problem: all unsigned char values will be converted to non-negative integers, so they will be distinct from EOF.

Now, there is a funny sort of loophole here: if a system has UCHAR_MAX > INT_MAX, then a system is legally allowed to convert values greater than INT_MAX to negative integers (per §6.3.1.3, the result of converting a value to a signed type that cannot represent that value is implementation defined), making it possible for a character read from a stream to be converted to EOF.

Systems with CHAR_BIT > 8 do exist (e.g. the TI C4x DSP, which apparently uses 32-bit bytes), although I'm not sure if they are broken with respect to EOF and stream functions.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...