Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
134 views
in Technique[技术] by (71.8m points)

java - Tokenizing a String but ignoring delimiters within quotes

I wish to have have the following String

!cmd 45 90 "An argument" Another AndAnother "Another one in quotes"

to become an array of the following

{ "!cmd", "45", "90", "An argument", "Another", "AndAnother", "Another one in quotes" }

I tried

new StringTokenizer(cmd, """)

but this would return "Another" and "AndAnother as "Another AndAnother" which is not the desired effect.

Thanks.

EDIT: I have changed the example yet again, this time I believe it explains the situation best although it is no different than the second example.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

It's much easier to use a java.util.regex.Matcher and do a find() rather than any kind of split in these kinds of scenario.

That is, instead of defining the pattern for the delimiter between the tokens, you define the pattern for the tokens themselves.

Here's an example:

    String text = "1 2 "333 4" 55 6    "77" 8 999";
    // 1 2 "333 4" 55 6    "77" 8 999

    String regex = ""([^"]*)"|(\S+)";

    Matcher m = Pattern.compile(regex).matcher(text);
    while (m.find()) {
        if (m.group(1) != null) {
            System.out.println("Quoted [" + m.group(1) + "]");
        } else {
            System.out.println("Plain [" + m.group(2) + "]");
        }
    }

The above prints (as seen on ideone.com):

Plain [1]
Plain [2]
Quoted [333 4]
Plain [55]
Plain [6]
Quoted [77]
Plain [8]
Plain [999]

The pattern is essentially:

"([^"]*)"|(S+)
 \_____/  \___/
    1       2

There are 2 alternates:

  • The first alternate matches the opening double quote, a sequence of anything but double quote (captured in group 1), then the closing double quote
  • The second alternate matches any sequence of non-whitespace characters, captured in group 2
  • The order of the alternates matter in this pattern

Note that this does not handle escaped double quotes within quoted segments. If you need to do this, then the pattern becomes more complicated, but the Matcher solution still works.

References

See also


Appendix

Note that StringTokenizer is a legacy class. It's recommended to use java.util.Scanner or String.split, or of course java.util.regex.Matcher for most flexibility.

Related questions


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...