org.apache.lucene.analysis.cn.smart

Class HMMChineseTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable


    public class HMMChineseTokenizer
    extends SegmentingTokenizerBase
    Tokenizer for Chinese or mixed Chinese-English text.

    The analyzer uses probabilistic knowledge to find the optimal word segmentation for Simplified Chinese text. The text is first broken into sentences, then each sentence is segmented into words.