Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
484 views
in Technique[技术] by (71.8m points)

regex - Java's Scanner vs String.split() vs StringTokenizer; which should I use?

I am currently using split() to scan through a file where each line has number of strings delimited by '~'. I read somewhere that Scanner could do a better job with a long file, performance-wise, so I thought about checking it out.

My question is: Would I have to create two instances of Scanner? That is, one to read a line and another one based on the line to get tokens for a delimiter? If I have to do so, I doubt if I would get any advantage from using it. Maybe I am missing something here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Did some metrics around these in a single threaded model and here are the results I got.

~~~~~~~~~~~~~~~~~~Time Metrics~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Tokenizer  |   String.Split()   |    while+SubString  |    Scanner    |    ScannerWithCompiledPattern    ~
~   4.0 ms   |      5.1 ms        |        1.2 ms       |     0.5 ms    |                0.1 ms            ~
~   4.4 ms   |      4.8 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.2 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
~   3.5 ms   |      4.7 ms        |        1.1 ms       |     0.1 ms    |                0.1 ms            ~
____________________________________________________________________________________________________________

The out come is that Scanner gives the best performance, Now the same needs to be evaluated on a multithreaded mode ! One of my senior's say that the Tokenizer gives a CPU spike and String.split does not.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...