In ElasticSearch regex flavor, there is no direct equivalent to a word boundary. Initial
is something like (^|[^A-Za-z0-9_])
if the word
starts with a word char, and the trailing
is like ($|[^A-Za-z0-9_])
if the word
ends with a word char.
Thus, we need to make sure that there is a non-word char before and after word
or start/end of string. Since the regex is anchored by default, all we need to make [^A-Za-z0-9_]
optional at start/end of string is add .*
beside and wrap with an optional grouping construct:
(.*[^A-Za-z0-9_])?word([^A-Za-z0-9_].*)?
Details
(.*[^A-Za-z0-9_])?
- either start of string or any 0+ chars (but a line break char, else use (.|
)*
) and then any char but a word char (basically, it is start of string followed with 1 or 0 occurrences of the pattern inside the group)
word
- a word
([^A-Za-z0-9_].*)?
- an optional sequence of any char but a word char followed with any 0+ chars, followed by the end of string position (implicit in Lucene regex).
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…