Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
816 views
in Technique[技术] by (71.8m points)

regex - Implementing Google search operators

Google currently uses keywords such as site: or is: in searches (the second example is from Gmail). I'm trying to develop a similar system and am wondering how best to go about identifying and handling those terms. For simplicity's sake, assume an OO language is being used (Ruby, Python, Java, C#, et al).

Currently, my plan is to have a separate class for each keyword. These classes have a precedence value and three methods:

  1. isRelevant(String searchPhrase): Returns true if the search phrase matches the class's filter.
  2. getResults(String searchPhrase): Returns a list of results based on the search phrase.
  3. reviseSearch(String searchPhrase): Returns a modified version of the search phrase. This will normally be removing the match to avoid it being processed again by a lower-precedence instance, but might also add text or clear the string entirely.

The calling method will then go through these keyword filters until the search phrase is empty or there are no more filters (in the latter case it will revert to its normal search behavior).

Thus, the question: Is this the most efficient way of doing this, or is there some more appropriate method? Some of the details still need to be figured out, but is this a step in the right direction?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Basics

sample string:

foo:(hello world) bar:(-{bad things}) email:[email protected] another:weird characters +=2{-52!%#^ final:end

split with regex:

/s+(?=w+:)/

return array:

[
  'foo:(hello world)',
  'bar:(-{bad things})',
  'email:[email protected]',
  'another:weird characters +=2{-52!%#^',
  'final:end'
]

regex explanation:

s+     one or more spaces
(?=     followed by (positive lookahead)
  w+   one or more word characters
  :     literal `:' (colon character)
)

usage:

Iterate through the array, split each element at : (colon). The left side key could be used to call a function and the right side value could be passed as the function parameter. This should pretty much put you on track for whatever you want to do from here.

Example ruby usage

search.rb

# Search class
class Search

  def initialize(query)
    @query = query
  end

  def foo(input)
    "foo has #{input}"
  end

  def bar(input)
    "bar has #{input}"
  end

  def email(input)
    "email has #{input}"
  end

  def another(input)
    "another has #{input}"
  end

  def final(input)
    "final has #{input}"
  end

  def exec
    @query.split(/s+(?=w+:)/).each do |e|
      method, arg = e.split(/:/)
      puts send(method, arg) if respond_to? method
    end
  end

end

use search.rb

q = "foo:(hello world) bar:(-{bad things}) email:[email protected] another:weird characters +=2{-52!%#^ final:end";
s = Search.new(q)
s.exec

output

foo has (hello world)
bar has (-{bad things})
email has [email protected]
another has weird characters +=2{-52!%#^
final has end

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...