Class PerFieldSimilarityWrapper

  • public abstract class PerFieldSimilarityWrapper
    extends Similarity
    Provides the ability to use a different Similarity for different fields.

    Subclasses should implement get(String) to return an appropriate Similarity (for example, using field-specific parameter values) for the field.

    For Lucene 6, you should pass a default similarity that is used for all non field-specific methods. From Lucene 7 on, this is no longer required.

    • Field Detail

      • defaultSim

        protected final Similarity defaultSim
        Default similarity used for query norm and coordination factors.
    • Constructor Detail

      • PerFieldSimilarityWrapper

        public PerFieldSimilarityWrapper()
        Deprecated. specify a default similarity for non field-specific calculations.
        Backwards compatibility constructor for 6.x series that creates a per-field similarity where all non field-specific methods return a constant (1).

        From Lucene 7 on, this will get the default again, because coordination factors and query normalization will be removed.

    • Method Detail

      • computeNorm

        public final long computeNorm(FieldInvertState state)
        Description copied from class: Similarity
        Computes the normalization value for a field, given the accumulated state of term processing for this field (see FieldInvertState).

        Matches in longer fields are less precise, so implementations of this method usually set smaller values when state.getLength() is large, and larger values when state.getLength() is small.

        Specified by:
        computeNorm in class Similarity
        state - current processing state for this field
        computed norm value
      • computeWeight

        public final Similarity.SimWeight computeWeight(CollectionStatistics collectionStats,
                                                        TermStatistics... termStats)
        Description copied from class: Similarity
        Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.
        Specified by:
        computeWeight in class Similarity
        collectionStats - collection-level statistics, such as the number of tokens in the collection.
        termStats - term-level statistics, such as the document frequency of a term across the collection.
        SimWeight object with the information this Similarity needs to score a query.
      • coord

        public final float coord(int overlap,
                                 int maxOverlap)
        Description copied from class: Similarity
        Hook to integrate coordinate-level matching.

        By default this is disabled (returns 1), as with most modern models this will only skew performance, but some implementations such as TFIDFSimilarity override this.

        coord in class Similarity
        overlap - the number of query terms matched in the document
        maxOverlap - the total number of terms in the query
        a score factor based on term overlap with the query
      • queryNorm

        public final float queryNorm(float valueForNormalization)
        Description copied from class: Similarity
        Computes the normalization value for a query given the sum of the normalized weights Similarity.SimWeight.getValueForNormalization() of each of the query terms. This value is passed back to the weight (Similarity.SimWeight.normalize(float, float) of each query term, to provide a hook to attempt to make scores from different queries comparable.

        By default this is disabled (returns 1), but some implementations such as TFIDFSimilarity override this.

        queryNorm in class Similarity
        valueForNormalization - the sum of the term normalization values
        a normalization factor for query weights