org.apache.lucene.analysis.compound

Class HyphenationCompoundWordTokenFilterFactory

  • All Implemented Interfaces:
    ResourceLoaderAware


    public class HyphenationCompoundWordTokenFilterFactory
    extends TokenFilterFactory
    implements ResourceLoaderAware
    Factory for HyphenationCompoundWordTokenFilter.

    This factory accepts the following parameters:

    • hyphenator (mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.
    • encoding (optional): encoding of the xml hyphenation file. defaults to UTF-8.
    • dictionary (optional): dictionary of words. defaults to no dictionary.
    • minWordSize (optional): minimal word length that gets decomposed. defaults to 5.
    • minSubwordSize (optional): minimum length of subwords. defaults to 2.
    • maxSubwordSize (optional): maximum length of subwords. defaults to 15.
    • onlyLongestMatch (optional): if true, adds only the longest matching subword to the stream. defaults to false.

     <fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100">
       <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8"
             dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/>
       </analyzer>
     </fieldType>
    See Also:
    HyphenationCompoundWordTokenFilter