Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
639 views
in Technique[技术] by (71.8m points)

java - A regex to match a comma that isn't surrounded by quotes

I'm using Clojure, so this is in the context of Java regexes.

Here is an example string:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}

The important bits are the commas after each string. I'd like to be able to replace them with newline characters with Java's replaceAll method. A regex that will match any comma that is not surrounded by quotes will do.

If I'm not coming across well, please ask and I'll be happily to clarify anything.

edit: sorry for the confusion in the title. I haven't been awake very long.

String: {:a "ab, cd efg",} <-- In this example, the comma at the end would be matched, but the ones inside the quote would not.

String: {:a 3, :b 3,} <-- Every single comma matches.

String {:a "abcd,efg" :b "abcedg,e"} <-- Every single comma doesn't match.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The regex:

,s*(?=([^"]*"[^"]*")*[^"]*$)

Matches:

{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}
                ^                  ^
                ^                  ^

and:

{:a "ab, cd efg",}
                ^
                ^

and does not match a comma in:

{:a "abcd,efg" :b "abcedg,e"}

But when escaped quotes can appear, like so:

{:a "ab," cd efg",} // only the last comma should match

then a regex solution won't work.

A brief explanation of the regex:

,            # match the character ','
s*          # match a whitespace character: [ 
x0Bf
] and repeat it zero or more times
(?=          # start positive look ahead
  (          #   start capture group 1
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
    [^"]*    #     match any character other than '"' and repeat it zero or more times
    "        #     match the character '"'
  )*         #   end capture group 1 and repeat it zero or more times
  [^"]*      #   match any character other than '"' and repeat it zero or more times
  $          #   match the end of the input
)            # end positive look ahead

In other words: match any comma that has zero, or an even number of quotes ahead of it (until the end of the string).


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...