EDIT: Rewritten to cover more edge cases.
This can be done, but it's a bit complicated.
result = subject.match(/(?:(?=(?:(?:\.|"(?:\.|[^"\])*"|[^\'"])*'(?:\.|"(?:\.|[^"'\])*"|[^\'])*')*(?:\.|"(?:\.|[^"\])*"|[^\'])*$)(?=(?:(?:\.|'(?:\.|[^'\])*'|[^\'"])*"(?:\.|'(?:\.|[^'"\])*'|[^\"])*")*(?:\.|'(?:\.|[^'\])*'|[^\"])*$)(?:\.|[^\'"]))+/g);
will return
, he said.
, she replied.
, he reminded her.
,
from this string (line breaks added and enclosing quotes removed for clarity):
"Hello", he said. "What's up, "doc"?", she replied.
'I need a 12" crash cymbal', he reminded her.
"2" by 4 inches", 'Back"'slashes \ are OK!'
Explanation: (sort of, it's a bit mindboggling)
Breaking up the regex:
(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\.|"(?:\.|[^"\])*"|[^\'"])*
'
(?:\.|"(?:\.|[^"'\])*"|[^\'])*
'
)*
(?:\.|"(?:\.|[^"\])*"|[^\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\.|'(?:\.|[^'\])*'|[^\'"])*
"
(?:\.|'(?:\.|[^'"\])*'|[^\"])*
"
)*
(?:\.|'(?:\.|[^'\])*'|[^\"])*
$
)
(?:\.|[^\'"]) # Match text between quoted sections
)+
First, you can see that there are two similar parts. Both these lookahead assertions ensure that there is an even number of single/double quotes in the string ahead, disregarding escaped quotes and quotes of the opposite kind. I'll show it with the single quotes part:
(?= # Assert that the following can be matched:
(?: # Match this group:
(?: # Match either:
\. # an escaped character
| # or
"(?:\.|[^"\])*" # a double-quoted string
| # or
[^\'"] # any character except backslashes or quotes
)* # any number of times.
' # Then match a single quote
(?:\.|"(?:\.|[^"'\])*"|[^\'])*' # Repeat once to ensure even number,
# (but don't allow single quotes within nested double-quoted strings)
)* # Repeat any number of times including zero
(?:\.|"(?:\.|[^"\])*"|[^\'])* # Then match the same until...
$ # ... end of string.
) # End of lookahead assertion.
The double quotes part works the same.
Then, at each position in the string where these two assertions succeed, the next part of the regex actually tries to match something:
(?: # Match either
\. # an escaped character
| # or
[^\'"] # any character except backslash, single or double quote
) # End of non-capturing group
The whole thing is repeated once or more, as many times as possible. The /g
modifier makes sure we get all matches in the string.
See it in action here on RegExr.