c# - Making Lucene.Net thread safe in the code

Question

Welcome To Ask or Share your Answers For Others

c# - Making Lucene.Net thread safe in the code

posted Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

c# - Making Lucene.Net thread safe in the code

I am using Lucene.Net for Searching and wanted to know how I can handle this threading issue.

I have a single instance of class Test, but the searcher is not threadsafe in this case, since the timer thread can update the index at the same time the request is served, and I do see exception due to that. Any pointers on how to make it thread safe.

public class Test 
{
    private static object syncObj = new object();

    private System.Threading.Timer timer;

    private Searcher searcher;

    private RAMDirectory idx = new RAMDirectory();

    public Test()
    {
        this.timer = new System.Threading.Timer(this.Timer_Elapsed, null, TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }


    private Searcher ESearcher
    {
        get
        {
            return this.searcher;
        }

        set
        {
            lock (syncObj)
            {
                this.searcher = value;
            }
        }
    }

    public Document CreateDocument(string title, string content)
    {
        Document doc = new Document();
        doc.Add(new Field("A", title, Field.Store.YES, Field.Index.NO));
        doc.Add(new Field("B", content, Field.Store.YES, Field.Index.ANALYZED));
        return doc;
    }

    public List<Document> Search(Searcher searcher, string queryString)
    {
        List<Document> documents = new List<Document>();
        QueryParser parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "B", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
        Query query = parser.Parse(queryString);
        int hitsPerPage = 5;
        TopScoreDocCollector collector = TopScoreDocCollector.Create(2 * hitsPerPage, true);
        this.ESearcher.Search(query, collector);

        ScoreDoc[] hits = collector.TopDocs().ScoreDocs;

        int hitCount = collector.TotalHits > 10 ? 10 : collector.TotalHits;
        for (int i = 0; i < hitCount; i++)
        {
            ScoreDoc scoreDoc = hits[i];
            int docId = scoreDoc.Doc;
            float docScore = scoreDoc.Score;
            Document doc = searcher.Doc(docId);
            documents.Add(doc);
        }

        return documents;
    }

    private void Timer_Elapsed(object sender)
    {
        this.Log("Started Updating the Search Indexing");
        // Get New data to Index
        using (IndexWriter writer = new IndexWriter(this.idx, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED))
        {
            foreach (var e in es)
            {
                writer.AddDocument(this.CreateDocument(e.Value.ToString(), e.Key));
            }

            writer.Optimize();
        }

        this.ESearcher = new IndexSearcher(this.idx);
        this.Log("Completed Updating the Search Indexing");
    }

    public Result ServeRequest()
    {
        var documents = this.Search(this.EntitySearcher, searchTerm);
        //somelogic
        return result;

    }

}

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2022-01-31T07:21:17+0000

Lots of things "wrong" with this.

As has been mentioned the locking wasn't safe (you need to lock reads as well as writes).

More significantly, there are better ways of handling this in Lucene. First, IndexWriter is itself threadsafe. It should be the owner of the Directory. It's generally "bad practice" to have different parts opening/closing the directory.

There is a style for NRT (Near Real Time) indexes which involves getting an IndexReader from the IW, rather than wrapping the Directory.

The style used in your example is only really "good" if the index is essentially read-only and maybe regenerated in batch daily/weekly etc.

I have rewritten the example to show some of the approach. Obviously, as this is just test code there will be nuances that will need refactoring/enhancing depending on the use case...

public class Test
{
    private static object syncObj = new object();

    private System.Threading.Timer timer;

    private Searcher searcher;

    private IndexWriter writer;
    private IndexReader reader;

    public Test()
    {
        writer = new IndexWriter(new RAMDirectory(), new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), true, IndexWriter.MaxFieldLength.LIMITED);
        reader = writer.GetReader();
        searcher = new IndexSearcher(reader);
        timer = new System.Threading.Timer(Timer_Elapsed, null, TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }


    public void CreateDocument(string title, string content)
    {
        var doc = new Document();
        doc.Add(new Field("A", title, Field.Store.YES, Field.Index.NO));
        doc.Add(new Field("B", content, Field.Store.YES, Field.Index.ANALYZED));

        writer.AddDocument(doc);
    }

    public void ReplaceAll(Dictionary<string, string> es)
    {
        // pause timer
        timer.Change(Timeout.Infinite, Timeout.Infinite);

        writer.DeleteAll();
        foreach (var e in es)
        {
            AddDocument(e.Value.ToString(), e.Key);
        }

        // restart timer
        timer.Change(TimeSpan.Zero, TimeSpan.FromMinutes(3));
    }

    public List<Document> Search(string queryString)
    {
        var documents = new List<Document>();
        var parser = new QueryParser(Lucene.Net.Util.Version.LUCENE_30, "B", new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
        Query query = parser.Parse(queryString);
        int hitsPerPage = 5;
        var collector = TopScoreDocCollector.Create(2 * hitsPerPage, true);
        searcher.Search(query, collector);

        ScoreDoc[] hits = collector.TopDocs().ScoreDocs;

        int hitCount = collector.TotalHits > 10 ? 10 : collector.TotalHits;
        for (int i = 0; i < hitCount; i++)
        {
            ScoreDoc scoreDoc = hits[i];
            int docId = scoreDoc.Doc;
            float docScore = scoreDoc.Score;
            Document doc = searcher.Doc(docId);
            documents.Add(doc);
        }

        return documents;
    }

    private void Timer_Elapsed(object sender)
    {
        if (reader.IsCurrent())
            return;

        reader = writer.GetReader();
        var newSearcher = new IndexSearcher(reader);
        Interlocked.Exchange(ref searcher, newSearcher);
        Debug.WriteLine("Searcher updated");
    }

    public Result ServeRequest(string searchTerm)
    {
        var documents = Search(searchTerm);
        //somelogic
        var result = new Result();

        return result;

    }
}

Note:

the writer "owns" the directory
if this was a file base Directory then you would have Open and Close methods to create/dispose the writer (which deals with handling the lock file). RamDirectory can just be GC'd
uses Interlocked.Exchange instead of lock. So zero cost when using the searcher member (here be dragons!)
new docs added directly to the writer
IsCurrent() allows for zero cost if no new docs have been added. Depending on how frequently you are adding docs, you may not need the timer at all (just call Timer_Elapsed - renamed obviously - at the top of Search).
don't use Optimize() it's a hangover from previous versions and it's use is highly discouraged (perf and disk I/O reasons)

Lastly, if you're using Lucene.net v4.8 then you should use SearcherManager (as suggested in another answer). But use the ctor that takes the IndexWriter and keep it as a "singleton" (same scope as writer). It will handle locking and getting new readers for you.

Categories

c# - Making Lucene.Net thread safe in the code

c# - Making Lucene.Net thread safe in the code

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags