What I have is a bunch of PDFs (few 100s). They don't have a proper structure nor do they have particular fields. All they have is lot of text.
What I am trying to do :
Index the PDFs and search for some keywords against the index.
I am interested in finding if that particular keyword is in the PDF doc and if it is, I want the line where the keyword is found.
If I searched for 'Google' in a PDF doc that has that term, I would like to see 'Google is a great search engine' which is the line in the PDF.
How I decided to do :
Either use SOLR or Whoosh but SOLR is looking good for inbuilt PDF support. I prefer to code in Python and Sunburst is a wrapper on SOLR which I like.
SOLR's sample/example project has some price comparision based schema file. Now I am not sure if I can use SOLR to answer my problem.
What do you guys suggest? Any input is much appreciated.
See Question&Answers more detail:
os 与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…