Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
128 views
in Technique[技术] by (71.8m points)

c - Why is the behaviour of subtracting characters implementation specific?

This statement:

if('z' - 'a' == 25)

is not guaranteed to evaluate in the same way. It is compiler dependent. Also, it is not guaranteed to be evaluated in the same way as this:

#if 'z' - 'a' == 25

even if both the preprocessor and compiler are run on the same machine. Why is that?

question from:https://stackoverflow.com/questions/46890093/why-is-the-behaviour-of-subtracting-characters-implementation-specific

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The OP is asking about a direct quote from the standard — N1570 §6.10.1p3,4 + footnote 168:

... the controlling constant expression is evaluated according to the rules of 6.6. ... This includes interpreting character constants, which may involve converting escape sequences into execution character set members. Whether the numeric value for these character constants matches the value obtained when an identical character constant occurs in an expression (other than within a #if or #elif directive) is implementation-defined.168

[footnote 168] Thus, the constant expression in the following #if directive and if statement is not guaranteed to evaluate to the same value in these two contexts.

#if 'z' - 'a' == 25
if ('z' - 'a' == 25)

So, yes, it really isn't guaranteed.

To understand why it isn't guaranteed, first you need to know that the C standard doesn't require the character constants 'a' and 'z' to have the numeric values assigned to those characters by ASCII. Most C implementations nowadays use ASCII or a superset, but there is another encoding called EBCDIC that is still widely used (only on IBM mainframes, but there are still a lot of those out there). In EBCDIC, not only do 'a' and 'z' have different values from ASCII, the alphabet isn't a contiguous sequence! That's why the expression 'z' - 'a' == 25 might not evaluate true in the first place.

You also need to know that the C standard tries to maintain a distinction between the text encoding used for source code (the "source character set") and the text encoding that the program will use at runtime (the "execution character set"). This is so you can, at least in principle, take a program whose source encoded in ASCII text and run it unmodified on a computer that uses EBCDIC, just by cross-compiling appropriately; you don't have to convert the source text to EBCDIC first.

Now, the compiler has to understand both character sets if they're different, but historically, the C preprocessor (translation phases 1 through 4) and the "compiler proper" (phases 5 through 7) were two separate programs, and #if expressions are the only place where the preprocessor would have to know about the execution character set. So, by making it implementation-defined whether the "execution character set" used by the preprocessor matches that used by the compiler proper, the standard licenses the preprocessor to do all its work in the source character set, making life a little bit easier back in 1989.

Having said all that, I would be very surprised to find a modern compiler that didn't make both expressions evaluate to the same value, even when the execution and source character sets are grossly incompatible. Modern compilers tend to have integrated preprocessors -- phases 1 through 7 are all carried out by the same program -- and even if they don't, the engineering burden of specializing the preprocessor to match its execution character set to the compiler proper is trivial nowadays.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...