Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
765 views
in Technique[技术] by (71.8m points)

data structures - Why is parameter to string.indexOf method is an int in Java

I am wondering why the parameter to indexOf method an int , when the description says a char.

public int indexOf(int ch)

Returns the index within this string of the first occurrence of the specified **character**

http://download.oracle.com/javase/1,5.0/docs/api/java/lang/String.html#indexOf%28int%29

Also, both of these compiles fine:
char c = 'p';
str.indexOf(2147483647);
str.indexOf(c);

a]Basically, what I am confused about is int in java is 32bit , while unicode characters are 16 bits .

b]Why not use the character themselves rather than using int . Is this any performance optimization ?. Are chars difficult to represent than int ? How ?

I assume this should be simple reasoning for this and that makes me know about it even more !

Thanks!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The real reason is that indexOf(int) expects a Unicode codepoint, not a 16-bit UTF-16 "character". Unicode code points are actually up to 21 bits in length.

(The UTF-16 representation of a longer codepoint is actually 2 16-bit "character" values. These values are known as leading and trailing surrogates; D80016 to DBFF16, and DC0016 to DFFF16 respectively; see Unicode FAQ - UTF-8, UTF-16, UTF-32 & BOM for the gory details.)

If you give indexOf(int) a code point > 65535 it will search for the pair of UTF-16 characters that encode the codepoint.

This is stated by the javadoc (albeit not very clearly), and an examination of the code indicates that this is indeed how the method is implemented.


Why not just use 16-bit characters ?

That's pretty obvious. If they did that, there wouldn't be an easy way to locate code points greater than 65535 in Strings. That would be a major problem for people who develop internationalized applications where text may contain such code points. (A lot of supposedly internationalized applications make the incorrect assumption that a char represents a code point. Often it doesn't matter, but increasingly often it does.)

But it shouldn't make any difference to you. The method will still work if your Strings consist of only 16 bit codes ... or, for that matter, of only ASCII codes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...