Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
76 views
in Technique[技术] by (71.8m points)

How exactly does String.split() method in Java work when regex is provided?

I'm preparing for OCPJP exam and I ran into the following example:

class Test {
   public static void main(String args[]) {
      String test = "I am preparing for OCPJP";
      String[] tokens = test.split("\S");
      System.out.println(tokens.length);
   }
}

This code prints 16. I was expecting something like no_of_characters + 1. Can someone explain me, what does the split() method actually do in this case? I just don't get it...

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It splits on every "\S" which in regex engine represents S non-whitespace character.

So lets try to split "x x" on non-whitespace (S). Since this regex can be matched by one character lets iterate over them to mark places of split (we will use pipe | for that).

  • is 'x' non-whitespace? YES, so lets mark it | x
  • is ' ' non-whitespace? NO, so we leave it as is
  • is last 'x' non-whitespace? YES, so lets mark it | |

So as result we need to split our string at start and at end which initially gives us result array

["", " ", ""]
   ^    ^ - here we split

But since trailing empty strings are removed, result would be

[""," "]     <- result
        ,""] <- removed trailing empty string

so split returns array ["", " "] which contains only two elements.

BTW. To turn off removing last empty strings you need to use split(regex,limit) with negative value of limit like split("\S",-1).


Now lets get back to your example. In case of your data you are splitting on each of

I am preparing for OCPJP
| || ||||||||| ||| |||||

which means

 ""|" "|""|" "|""|""|""|""|""|""|""|""|" "|""|""|" "|""|""|""|""|""

So this represents this array

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  

but since trailing empty strings "" are removed (if their existence was caused by split - more info at: Confusing output from String.split)

[""," ",""," ","","","","","","","",""," ","",""," ","","","","",""]  
                                                     ^^ ^^ ^^ ^^ ^^

you are getting as result array which contains only this part:

[""," ",""," ","","","","","","","",""," ","",""," "]  

which are exactly 16 elements.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...