regex - What regular expression features are supported by Solr edismax?

Question

Welcome To Ask or Share your Answers For Others

regex - What regular expression features are supported by Solr edismax?

posted Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

regex - What regular expression features are supported by Solr edismax?

Regular expressions allows for the pattern matching syntax shown below. I'm trying to implement a powerful search tool that implements as many of these as possible. I'm told that edismax is the most flexible tool for the job. Which of the pattern matching expressions below can be accomplished with edismax? Can I do better than edismax? Can you suggest which filters and parser patches I might use to work towards achieving this functionality? Am I dreaming if I think Solr can achieve acceptable performance (i.e. server-side processing time) of these kinds of searches?

regular expression syntax & examples from mysql

^ match beginning of string. 'fofo' REGEXP '^fo' => true
$ match end of string. 'fo o' REGEXP '^fo o$' => true
* 0-unlimited wildcard. 'Baaaan' REGEXP 'Ba*n' => true
? 0-1 wildcard. 'Baan' REGEXP '^Ba?n => false'
+ 1-unlimited wildcard. 'Bn' REGEXP 'Ba+n' => false
| or. 'pi' REGEXP 'pi|apa' => true
()* sequence match. 'pipi' REGEXP '^(pi)*$' => true
[a-dX], [^a-dX] character range/set 'aXbc' REGEXP '[a-dXYZ]' => true
{n} or {m,n} cardinality notation 'abcde' REGEXP 'a[bcd]{3}e' => true
[:character_class:] 'justalnums' REGEXP '[[:alnum:]]+' => true

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-23T19:14:39+0000

Version 4.0 of Lucene will support regex queries directly in the standard query parser using special syntax. I verified that it works on an instance of Solr I am running, built from the subversion trunk in February.

Jira ticket 2604 describes the extension of the standard query parser using special regex syntax, using forward slashes to delimit the regex, similar to syntax in Javascript. It seems to be using the underlying RegexpQuery parser.

So a brief example:

body:/[0-9]{5}/

will match on a five-digit zip code in the textual corpus I have indexed. But, oddly, body:/d{5}/ did not work for me, and ^ failed as well.

The regex dialect would have to be Java's, but I'm not sure if everything in it works, since I have only done a cursory examination. One would probably have to look carefully at the RegexpQuery code to understand what works and what doesn't.

Categories

regex - What regular expression features are supported by Solr edismax?

regex - What regular expression features are supported by Solr edismax?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags