public final class NGramTokenFilter extends TokenFilter
If you were using this
TokenFilter to perform partial highlighting,
this won't work anymore since this filter doesn't update offsets. You should
modify your analysis chain to use
NGramTokenizer, and potentially
NGramTokenizer.isTokenChar(int) to perform pre-tokenization.
|Modifier and Type||Field and Description|
|Constructor and Description|
Creates NGramTokenFilter with default min and max n-grams.
Creates NGramTokenFilter with given min and max n-grams.
|Modifier and Type||Method and Description|
Returns the next token in the stream, or null at EOS.
This method is called by a consumer before it begins consumption using
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
public static final int DEFAULT_MIN_NGRAM_SIZE
public NGramTokenFilter(TokenStream input, int minGram, int maxGram)
TokenStreamholding the input to be tokenized
minGram- the smallest n-gram to generate
maxGram- the largest n-gram to generate
public final boolean incrementToken() throws IOException
public void reset() throws IOException
Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.
The default implementation chains the call to the input TokenStream, so
be sure to call
super.reset() when overriding this method.