org.apache.lucene.analysis.core

Class LetterTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable
    Direct Known Subclasses:
    LowerCaseTokenizer


    public class LetterTokenizer
    extends CharTokenizer
    A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate.

    Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.