Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
231 views
in Technique[技术] by (71.8m points)

java - How to sort search results on multiple fields using a weighting function?

I have a Lucene index where every document has several fields which contain numeric values. Now I would like to sort the search result on a weighted sum of this field. For example:

field1=100
field2=002
field3=014

And the weighting function looks like:

f(d) = field1 * 0.5 + field2 * 1.4 + field3 * 1.8

The results should be ordered by f(d) where d represents the document. The sorting function should be non-static and could differ from search to search because the constant factors are influenced by the user who performs the search.

Has anyone an idea how to solve this or maybe an idea how to accomplish this goal in another way?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You could try implementing a custom ScoreDocComparator. For example:

public class ScaledScoreDocComparator implements ScoreDocComparator {

    private int[][] values;
    private float[] scalars;

    public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
        this.scalars = scalars;
        this.values = new int[fields.length][];
        for (int i = 0; i < values.length; i++) {
            this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
        }
    }

    protected float score(ScoreDoc scoreDoc) {
        int doc = scoreDoc.doc;

        float score = 0;
        for (int i = 0; i < values.length; i++) {
            int value = values[i][doc];
            float scalar = scalars[i];
            score += (value * scalar);
        }
        return score;
    }

    @Override
    public int compare(ScoreDoc i, ScoreDoc j) {
        float iScore = score(i);
        float jScore = score(j);
        return Float.compare(iScore, jScore);
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> sortValue(ScoreDoc i) {
        float score = score(i);
        return Float.valueOf(score);
    }

}

Here is an example of ScaledScoreDocComparator in action. I believe it works in my test, but I encourage you to prove it against your data.

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new ScaledScoreDocComparator(reader, fields, scalars);
            }
        }
    )
);

IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;

TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;

Bonus!

It appears that the Lucene developers are deprecating the ScoreDocComparator interface (it's currently deprecated in the Subversion repository). Here is an example of the ScaledScoreDocComparator modified to adhere to ScoreDocComparator's successor, FieldComparator:

public class ScaledComparator extends FieldComparator {

    private String[] fields;
    private float[] scalars;
    private int[][] slotValues;
    private int[][] currentReaderValues;
    private int bottomSlot;

    public ScaledComparator(int numHits, String[] fields, float[] scalars) {
        this.fields = fields;
        this.scalars = scalars;

        this.slotValues = new int[this.fields.length][];
        for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
            this.slotValues[fieldIndex] = new int[numHits];
        }

        this.currentReaderValues = new int[this.fields.length][];
    }

    protected float score(int[][] values, int secondaryIndex) {
        float score = 0;

        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            int value = values[fieldIndex][secondaryIndex];
            float scalar = scalars[fieldIndex];
            score += (value * scalar);
        }

        return score;
    }

    protected float scoreSlot(int slot) {
        return score(slotValues, slot);
    }

    protected float scoreDoc(int doc) {
        return score(currentReaderValues, doc);
    }

    @Override
    public int compare(int slot1, int slot2) {
        float score1 = scoreSlot(slot1);
        float score2 = scoreSlot(slot2);
        return Float.compare(score1, score2);
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        float bottomScore = scoreSlot(bottomSlot);
        float docScore = scoreDoc(doc);
        return Float.compare(bottomScore, docScore);
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
        }
    }

    @Override
    public void setBottom(int slot) {
        bottomSlot = slot;
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            String field = fields[fieldIndex];
            currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
        }
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> value(int slot) {
        float score = scoreSlot(slot);
        return Float.valueOf(score);
    }

}

Using this new class is very similar to the original, except that the definition of the sort object is a bit different:

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new FieldComparatorSource() {
            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                return new ScaledComparator(numHits, fields, scalars);
            }
        }
    )
);

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...