org.apache.lucene.search.suggest.analyzing

Class FuzzySuggester

  • All Implemented Interfaces:
    Accountable


    public final class FuzzySuggester
    extends AnalyzingSuggester
    Implements a fuzzy AnalyzingSuggester. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false for the transpositions parameter.

    At most, this query will match terms up to 2 edits. Higher distances are not supported. Note that the fuzzy distance is measured in "byte space" on the bytes returned by the TokenStream's TermToBytesRefAttribute, usually UTF8. By default the analyzed bytes must be at least 3 DEFAULT_MIN_FUZZY_LENGTH bytes before any edits are considered. Furthermore, the first 1 DEFAULT_NON_FUZZY_PREFIX byte is not allowed to be edited. We allow up to 1 (@link #DEFAULT_MAX_EDITS} edit. If unicodeAware parameter in the constructor is set to true, maxEdits, minFuzzyLength, transpositions and nonFuzzyPrefix are measured in Unicode code points (actual letters) instead of bytes.

    NOTE: This suggester does not boost suggestions that required no edits over suggestions that did require edits. This is a known limitation.

    Note: complex query analyzers can have a significant impact on the lookup performance. It's recommended to not use analyzers that drop or inject terms like synonyms to keep the complexity of the prefix intersection low for good lookup performance. At index time, complex analyzers can safely be used.