Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

regex - How to match "anything up until this sequence of characters" in a regular expression?

Take this regular expression: /^[^abc]/. This will match any single character at the beginning of a string, except a, b, or c.

If you add a * after it – /^[^abc]*/ – the regular expression will continue to add each subsequent character to the result, until it meets either an a, or b, or c.

For example, with the source string "qwerty qwerty whatever abc hello", the expression will match up to "qwerty qwerty wh".

But what if I wanted the matching string to be "qwerty qwerty whatever "

...In other words, how can I match everything up to (but not including) the exact sequence "abc"?

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You didn't specify which flavor of regex you're using, but this will work in any of the most popular ones that can be considered "complete".

/.+?(?=abc)/

How it works

The .+? part is the un-greedy version of .+ (one or more of anything). When we use .+, the engine will basically match everything. Then, if there is something else in the regex it will go back in steps trying to match the following part. This is the greedy behavior, meaning as much as possible to satisfy.

When using .+?, instead of matching all at once and going back for other conditions (if any), the engine will match the next characters by step until the subsequent part of the regex is matched (again if any). This is the un-greedy, meaning match the fewest possible to satisfy.

/.+X/  ~ "abcXabcXabcX"        /.+/  ~ "abcXabcXabcX"
          ^^^^^^^^^^^^                  ^^^^^^^^^^^^

/.+?X/ ~ "abcXabcXabcX"        /.+?/ ~ "abcXabcXabcX"
          ^^^^                          ^

Following that we have (?={contents}), a zero width assertion, a look around. This grouped construction matches its contents, but does not count as characters matched (zero width). It only returns if it is a match or not (assertion).

Thus, in other terms the regex /.+?(?=abc)/ means:

Match any characters as few as possible until a "abc" is found, without counting the "abc".


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...