org.apache.lucene.analysis.cn.smart

Class HMMChineseTokenizerFactory



  • public final class HMMChineseTokenizerFactory
    extends TokenizerFactory
    Factory for HMMChineseTokenizer

    Note: this class will currently emit tokens for punctuation. So you should either add a WordDelimiterFilter after to remove these (with concatenate off), or use the SmartChinese stoplist with a StopFilterFactory via: words="org/apache/lucene/analysis/cn/smart/stopwords.txt"