Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
146 views
in Technique[技术] by (71.8m points)

Why is executing Java code in comments with certain Unicode characters allowed?

The following code produces the output "Hello World!" (no really, try it).

public static void main(String... args) {

   // The comment below is not a typo.
   // u000d System.out.println("Hello World!");
}

The reason for this is that the Java compiler parses the Unicode character u000d as a new line and gets transformed into:

public static void main(String... args) {

   // The comment below is not a typo.
   //
   System.out.println("Hello World!");
}

Thus resulting into a comment being "executed".

Since this can be used to "hide" malicious code or whatever an evil programmer can conceive, why is it allowed in comments?

Why is this allowed by the Java specification?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Unicode decoding takes place before any other lexical translation. The key benefit of this is that it makes it trivial to go back and forth between ASCII and any other encoding. You don't even need to figure out where comments begin and end!

As stated in JLS Section 3.3 this allows any ASCII based tool to process the source files:

[...] The Java programming language specifies a standard way of transforming a program written in Unicode into ASCII that changes a program into a form that can be processed by ASCII-based tools. [...]

This gives a fundamental guarantee for platform independence (independence of supported character sets) which has always been a key goal for the Java platform.

Being able to write any Unicode character anywhere in the file is a neat feature, and especially important in comments, when documenting code in non-latin languages. The fact that it can interfere with the semantics in such subtle ways is just an (unfortunate) side-effect.

There are many gotchas on this theme and Java Puzzlers by Joshua Bloch and Neal Gafter included the following variant:

Is this a legal Java program? If so, what does it print?

u0070u0075u0062u006cu0069u0063u0020u0020u0020u0020
u0063u006cu0061u0073u0073u0020u0055u0067u006cu0079
u007bu0070u0075u0062u006cu0069u0063u0020u0020u0020
u0020u0020u0020u0020u0073u0074u0061u0074u0069u0063
u0076u006fu0069u0064u0020u006du0061u0069u006eu0028
u0053u0074u0072u0069u006eu0067u005bu005du0020u0020
u0020u0020u0020u0020u0061u0072u0067u0073u0029u007b
u0053u0079u0073u0074u0065u006du002eu006fu0075u0074
u002eu0070u0072u0069u006eu0074u006cu006eu0028u0020
u0022u0048u0065u006cu006cu006fu0020u0077u0022u002b
u0022u006fu0072u006cu0064u0022u0029u003bu007du007d

(This program turns out to be a plain "Hello World" program.)

In the solution to the puzzler, they point out the following:

More seriously, this puzzle serves to reinforce the lessons of the previous three: Unicode escapes are essential when you need to insert characters that can’t be represented in any other way into your program. Avoid them in all other cases.


Source: Java: Executing code in comments?!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...