org.apache.lucene.analysis.standard

Class StandardTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable


    public final class StandardTokenizer
    extends Tokenizer
    A grammar-based tokenizer constructed with JFlex.

    This class implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

    Many applications have specific tokenizer needs. If this tokenizer does not suit your application, please consider copying this source code directory to your project and maintaining your own grammar-based tokenizer.