Overview

To be able to search the text efficiently and effectively, Solr (mostly Lucene actually) splits the text into tokens during indexing as well as during query (search). Those tokens can also be pre- and post-filtered for additional flexibility. This allows for things like case-insensitive search, misspelt product names, synonyms, and so on.

To achieve all this flexibility, Solr comes quite a variety of methods to manipulate the text. Understanding what filters and tokenizers are available and what they actually do is a major stumbling block for new Solr users. This page provides a comprehensive overview of all the classes that can be used in Solr, together with the link to their Javadoc pages.

Most of the analyzers, tokenizers and filters are located in lucene-analyzers-common-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ ), so any entry without a location indicated can be found in that jar.

Note: all of this is only applicable to the text fields with fieldType's class solr.TextField. If your fieldType's class is solr.StrField, it does not get analyzed (similar to using plain KeywordTokenizerFactory).

Non-chainable analysers

The set below are the analyzers that are standalone. They take in text and out comes a sequence of tokens. The same analyzer is used during indexing and during search. Many of these come from Lucene itself. Only analyzers that can be used by Solr are listed here. Lucene has some other analyzers that cannot be used directly because they have non-standard initialization requirements.

<fieldType name="text_greek" class="solr.TextField">
  <analyzer class="org.apache.lucene.analysis.el.GreekAnalyzer"/>
</fieldType>

Analyzer in lucene-core-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
An Analyzer builds TokenStreams, which analyze text.

AnalyzerWrapper in lucene-core-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Extension to Analyzer suitable for Analyzers which wrap other Analyzers.

ShingleAnalyzerWrapper
A ShingleAnalyzerWrapper wraps a ShingleFilter around another Analyzer.

DutchAnalyzer
Analyzer for Dutch language.

KeywordAnalyzer
"Tokenizes" the entire stream as a single token.

MorfologikAnalyzer in lucene-analyzers-morfologik-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
org.apache.lucene.analysis.Analyzer using Morfologik library.

SimpleAnalyzer
An Analyzer that filters LetterTokenizer with LowerCaseFilter

SmartChineseAnalyzer in lucene-analyzers-smartcn-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
SmartChineseAnalyzer is an analyzer for Chinese or mixed Chinese-English text.

StopwordAnalyzerBase
Base class for Analyzers that need to make use of stopword sets.

ArabicAnalyzer
Analyzer for Arabic.

ArmenianAnalyzer
Analyzer for Armenian.

BasqueAnalyzer
Analyzer for Basque.

BrazilianAnalyzer
Analyzer for Brazilian Portuguese language.

BulgarianAnalyzer
Analyzer for Bulgarian.

CatalanAnalyzer
Analyzer for Catalan.

CJKAnalyzer
An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

ClassicAnalyzer
Filters ClassicTokenizer with ClassicFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

CzechAnalyzer
Analyzer for Czech language.

DanishAnalyzer
Analyzer for Danish.

EnglishAnalyzer
Analyzer for English.

FinnishAnalyzer
Analyzer for Finnish.

FrenchAnalyzer
Analyzer for French language.

GalicianAnalyzer
Analyzer for Galician.

GermanAnalyzer
Analyzer for German language.

GreekAnalyzer
Analyzer for the Greek language.

HindiAnalyzer
Analyzer for Hindi.

HungarianAnalyzer
Analyzer for Hungarian.

IndonesianAnalyzer
Analyzer for Indonesian (Bahasa)

IrishAnalyzer
Analyzer for Irish.

ItalianAnalyzer
Analyzer for Italian.

JapaneseAnalyzer in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Analyzer for Japanese that uses morphological analysis.

LatvianAnalyzer
Analyzer for Latvian.

NorwegianAnalyzer
Analyzer for Norwegian.

PersianAnalyzer
Analyzer for Persian.

PolishAnalyzer in lucene-analyzers-stempel-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Analyzer for Polish.

PortugueseAnalyzer
Analyzer for Portuguese.

RomanianAnalyzer
Analyzer for Romanian.

RussianAnalyzer
Analyzer for Russian language.

SoraniAnalyzer
Analyzer for Sorani Kurdish.

SpanishAnalyzer
Analyzer for Spanish.

StandardAnalyzer
Filters StandardTokenizer with StandardFilter, LowerCaseFilter and StopFilter, using a list of English stop words.

StopAnalyzer
Filters LetterTokenizer with LowerCaseFilter and StopFilter.

SwedishAnalyzer
Analyzer for Swedish.

ThaiAnalyzer
Analyzer for Thai language.

TurkishAnalyzer
Analyzer for Turkish.

UAX29URLEmailAnalyzer
Filters org.apache.lucene.analysis.standard.UAX29URLEmailTokenizer with org.apache.lucene.analysis.standard.StandardFilter, org.apache.lucene.analysis.core.LowerCaseFilter and org.apache.lucene.analysis.core.StopFilter, using a list of English stop words.

WhitespaceAnalyzer
An Analyzer that uses WhitespaceTokenizer.

Chainable tokenizers and filters

A more flexible approach than a single all-encompassing tokenizer is to chain and configure some tokenizers and filters together to fit particular customer requirements. Solr allows to have up to three type of components in the chain:

Character filters
These are optional and operate on the original text (before tokens). They can change the text in any way imaginable by adding, removing or transforming characters. There could be none, one, or many of these filters and they operate in the sequence defined
Tokenizer
There can only be one of these and its presence is compulsory. The tokenizer takes the text stream and splits out a sequence of tokens with their positions. Actually, it is more complicated, as the output is actually a graph, but most of the time we can think of it as a sequence
Token filters
These filters are also optional and they work similar to character filters, but on individual tokens. They can change tokens, remove them or add additional ones. They output tokens, so naturally, they can also be chained

Character filters

CharFilterFactory
Abstract parent class for analysis factories that create CharFilter instances.

HTMLStripCharFilterFactory
A CharFilter that wraps another Reader and attempts to strip out HTML constructs.

ICUNormalizer2CharFilterFactory (multi) in lucene-analyzers-icu-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Normalize token text with ICU's Normalizer2.

JapaneseIterationMarkCharFilterFactory (multi) in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

MappingCharFilterFactory (multi)
Simplistic CharFilter that applies the mappings contained in a NormalizeCharMap to the character stream, and correcting the resulting changes to the offsets.

PatternReplaceCharFilterFactory
CharFilter that uses a regular expression for the target of replace string.

PersianCharFilterFactory (multi)
CharFilter that replaces instances of Zero-width non-joiner with an ordinary space.

Tokenizers

TokenizerFactory
Abstract parent class for analysis factories that create Tokenizer instances.

ClassicTokenizerFactory
A grammar-based tokenizer constructed with JFlex

EdgeNGramTokenizerFactory
Creates new instances of EdgeNGramTokenizer.

HMMChineseTokenizerFactory in lucene-analyzers-smartcn-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Tokenizer for Chinese or mixed Chinese-English text.

ICUTokenizerFactory in lucene-analyzers-icu-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Breaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/)

JapaneseTokenizerFactory in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Tokenizer for Japanese that uses morphological analysis.

KeywordTokenizerFactory
Emits the entire input as a single token.

LetterTokenizerFactory
A LetterTokenizer is a tokenizer that divides text at non-letters.

LowerCaseTokenizerFactory (multi)
LowerCaseTokenizer performs the function of LetterTokenizer and LowerCaseFilter together.

NGramTokenizerFactory
Tokenizes the input into n-grams of the given size(s).

PathHierarchyTokenizerFactory
Tokenizer for path-like hierarchies.

PatternTokenizerFactory
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream.

SmartChineseSentenceTokenizerFactory in lucene-analyzers-smartcn-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Factory for the SmartChineseAnalyzer SentenceTokenizer

StandardTokenizerFactory
A grammar-based tokenizer constructed with JFlex.

ThaiTokenizerFactory
Tokenizer that use BreakIterator to tokenize Thai text.

UAX29URLEmailTokenizerFactory
This class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs.

UIMAAnnotationsTokenizerFactory in lucene-analyzers-uima-5.2.1.jar ( contrib/uima/lucene-libs/ )
org.apache.lucene.analysis.util.TokenizerFactory for UIMAAnnotationsTokenizer

UIMATypeAwareAnnotationsTokenizerFactory in lucene-analyzers-uima-5.2.1.jar ( contrib/uima/lucene-libs/ )
org.apache.lucene.analysis.util.TokenizerFactory for UIMATypeAwareAnnotationsTokenizer

WhitespaceTokenizerFactory
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.

WikipediaTokenizerFactory
Extension of StandardTokenizer that is aware of Wikipedia syntax.

Token filters

TokenFilterFactory
Abstract parent class for analysis factories that create org.apache.lucene.analysis.TokenFilter instances.

ApostropheFilterFactory
Strips all characters after an apostrophe (including the apostrophe itself).

ArabicNormalizationFilterFactory (multi)
A TokenFilter that applies ArabicNormalizer to normalize the orthography.

ArabicStemFilterFactory
A TokenFilter that applies ArabicStemmer to stem Arabic words..

ASCIIFoldingFilterFactory (multi)
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.

BaseManagedTokenFilterFactory in solr-core-5.2.1.jar ( dist/ )
Abstract based class for implementing TokenFilterFactory objects that are managed by the REST API.

ManagedStopFilterFactory in solr-core-5.2.1.jar ( dist/ )
TokenFilterFactory that uses the ManagedWordSetResource implementation for managing stop words using the REST API.

ManagedSynonymFilterFactory in solr-core-5.2.1.jar ( dist/ )
TokenFilterFactory and ManagedResource implementation for doing CRUD on synonyms using the REST API.

BeiderMorseFilterFactory in lucene-analyzers-phonetic-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
TokenFilter for Beider-Morse phonetic encoding.

BrazilianStemFilterFactory
A TokenFilter that applies BrazilianStemmer.

BulgarianStemFilterFactory
A TokenFilter that applies BulgarianStemmer to stem Bulgarian words.

CapitalizationFilterFactory
A filter to apply normal capitalization rules to Tokens.

CJKBigramFilterFactory
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.

CJKWidthFilterFactory (multi)
A TokenFilter that normalizes CJK width differences:

  • Folds fullwidth ASCII variants into the equivalent basic latin
  • Folds halfwidth Katakana variants into the equivalent kana

ClassicFilterFactory
Normalizes tokens extracted with ClassicTokenizer.

CodepointCountFilterFactory
Removes words that are too long or too short from the stream.

CommonGramsFilterFactory
Constructs a CommonGramsFilter.

CommonGramsQueryFilterFactory
Construct CommonGramsQueryFilter.

CzechStemFilterFactory
A TokenFilter that applies CzechStemmer to stem Czech words.

DaitchMokotoffSoundexFilterFactory in lucene-analyzers-phonetic-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Create tokens for phonetic matches based on Daitch–Mokotoff Soundex.

DelimitedPayloadTokenFilterFactory
Characters before the delimiter are the "token", those after are the payload.

DictionaryCompoundWordTokenFilterFactory
A org.apache.lucene.analysis.TokenFilter that decomposes compound words found in many Germanic languages.

DoubleMetaphoneFilterFactory in lucene-analyzers-phonetic-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Filter for DoubleMetaphone (supporting secondary codes)

EdgeNGramFilterFactory
Creates new instances of EdgeNGramTokenFilter.

ElisionFilterFactory (multi)
Removes elisions from a TokenStream.

EnglishMinimalStemFilterFactory
A TokenFilter that applies EnglishMinimalStemmer to stem English words.

EnglishPossessiveFilterFactory
TokenFilter that removes possessives (trailing 's) from words.

FinnishLightStemFilterFactory
A TokenFilter that applies FinnishLightStemmer to stem Finnish words.

FrenchLightStemFilterFactory
A TokenFilter that applies FrenchLightStemmer to stem French words.

FrenchMinimalStemFilterFactory
A TokenFilter that applies FrenchMinimalStemmer to stem French words.

GalicianMinimalStemFilterFactory
A TokenFilter that applies GalicianMinimalStemmer to stem Galician words.

GalicianStemFilterFactory
A TokenFilter that applies GalicianStemmer to stem Galician words.

GermanLightStemFilterFactory
A TokenFilter that applies GermanLightStemmer to stem German words.

GermanMinimalStemFilterFactory
A TokenFilter that applies GermanMinimalStemmer to stem German words.

GermanNormalizationFilterFactory (multi)
Normalizes German characters according to the heuristics of the German2 snowball algorithm.

GermanStemFilterFactory
A TokenFilter that stems German words.

GreekLowerCaseFilterFactory (multi)
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma to sigma.

GreekStemFilterFactory
A TokenFilter that applies GreekStemmer to stem Greek words.

HindiNormalizationFilterFactory (multi)
A TokenFilter that applies HindiNormalizer to normalize the orthography.

HindiStemFilterFactory
A TokenFilter that applies HindiStemmer to stem Hindi words.

HungarianLightStemFilterFactory
A TokenFilter that applies HungarianLightStemmer to stem Hungarian words.

HunspellStemFilterFactory
TokenFilterFactory that creates instances of HunspellStemFilter.

HyphenatedWordsFilterFactory
When the plain text is extracted from documents, we will often have many words hyphenated and broken into two lines.

HyphenationCompoundWordTokenFilterFactory
A org.apache.lucene.analysis.TokenFilter that decomposes compound words found in many Germanic languages.

ICUFoldingFilterFactory (multi) in lucene-analyzers-icu-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.

ICUNormalizer2FilterFactory (multi) in lucene-analyzers-icu-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Normalize token text with ICU's com.ibm.icu.text.Normalizer2

ICUTransformFilterFactory (multi) in lucene-analyzers-icu-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
A TokenFilter that transforms text with ICU.

IndicNormalizationFilterFactory (multi)
A TokenFilter that applies IndicNormalizer to normalize text in Indian Languages.

IndonesianStemFilterFactory
A TokenFilter that applies IndonesianStemmer to stem Indonesian words.

IrishLowerCaseFilterFactory (multi)
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')

ItalianLightStemFilterFactory
A TokenFilter that applies ItalianLightStemmer to stem Italian words.

JapaneseBaseFormFilterFactory in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Replaces term text with the BaseFormAttribute.

JapaneseKatakanaStemFilterFactory in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
A TokenFilter that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).

JapanesePartOfSpeechStopFilterFactory in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Removes tokens that match a set of part-of-speech tags.

JapaneseReadingFormFilterFactory in lucene-analyzers-kuromoji-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
A org.apache.lucene.analysis.TokenFilter that replaces the term attribute with the reading of a token in either katakana or romaji form.

KeepWordFilterFactory
A TokenFilter that only keeps tokens with text contained in the required words.

KeywordMarkerFilterFactory
Marks terms as keywords via the KeywordAttribute.

KeywordRepeatFilterFactory
This TokenFilter emits each incoming token twice once as keyword and once non-keyword, in other words once with KeywordAttribute#setKeyword(boolean) set to true and once set to false.

KStemFilterFactory
A high-performance kstem filter for english.

LatvianStemFilterFactory
A TokenFilter that applies LatvianStemmer to stem Latvian words.

LengthFilterFactory
Removes words that are too long or too short from the stream.

LimitTokenCountFilterFactory
This TokenFilter limits the number of tokens while indexing.

LimitTokenOffsetFilterFactory
Lets all tokens pass through until it sees one with a start offset <= a configured limit, which won't pass and ends the stream.

LimitTokenPositionFilterFactory
This TokenFilter limits its emitted tokens to those with positions that are not greater than the configured limit.

LowerCaseFilterFactory (multi)
Normalizes token text to lower case.

MorfologikFilterFactory in lucene-analyzers-morfologik-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Filter factory for MorfologikFilter.

NGramFilterFactory
Tokenizes the input into n-grams of the given size(s).

NorwegianLightStemFilterFactory
A TokenFilter that applies NorwegianLightStemmer to stem Norwegian words.

NorwegianMinimalStemFilterFactory
A TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian words.

NumericPayloadTokenFilterFactory
Assigns a payload to a token based on the org.apache.lucene.analysis.Token#type()

PatternCaptureGroupFilterFactory
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns.

PatternReplaceFilterFactory
A TokenFilter which applies a Pattern to each token in the stream, replacing match occurances with the specified replacement string.

PersianNormalizationFilterFactory (multi)
A TokenFilter that applies PersianNormalizer to normalize the orthography.

PhoneticFilterFactory in lucene-analyzers-phonetic-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Create tokens for phonetic matches.

PorterStemFilterFactory
Transforms the token stream as per the Porter stemming algorithm.

PortugueseLightStemFilterFactory
A TokenFilter that applies PortugueseLightStemmer to stem Portuguese words.

PortugueseMinimalStemFilterFactory
A TokenFilter that applies PortugueseMinimalStemmer to stem Portuguese words.

PortugueseStemFilterFactory
A TokenFilter that applies PortugueseStemmer to stem Portuguese words.

RemoveDuplicatesTokenFilterFactory
A TokenFilter which filters out Tokens at the same position and Term text as the previous token in the stream.

ReversedWildcardFilterFactory in solr-core-5.2.1.jar ( dist/ )
This class produces a special form of reversed tokens, suitable for better handling of leading wildcards.

ReverseStringFilterFactory
Reverse token string, for example "country" => "yrtnuoc".

RussianLightStemFilterFactory
A TokenFilter that applies RussianLightStemmer to stem Russian words.

ScandinavianFoldingFilterFactory
This filter folds Scandinavian characters åÅäæÄÆ->a and öÖøØ->o.

ScandinavianNormalizationFilterFactory
This filter normalize use of the interchangeable Scandinavian characters æÆäÄöÖøØ and folded variants (aa, ao, ae, oe and oo) by transforming them to åÅæÆøØ.

SerbianNormalizationFilterFactory (multi)
Normalizes Serbian Cyrillic and Latin characters to "bald" Latin.

ShingleFilterFactory
A ShingleFilter constructs shingles (token n-grams) from a token stream.

SmartChineseWordTokenFilterFactory in lucene-analyzers-smartcn-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Factory for the SmartChineseAnalyzer WordTokenFilter

SnowballPorterFilterFactory
A filter that stems words using a Snowball-generated stemmer.

SoraniNormalizationFilterFactory (multi)
A TokenFilter that applies SoraniNormalizer to normalize the orthography.

SoraniStemFilterFactory
A TokenFilter that applies SoraniStemmer to stem Sorani words.

SpanishLightStemFilterFactory
A TokenFilter that applies SpanishLightStemmer to stem Spanish words.

StandardFilterFactory
Normalizes tokens extracted with StandardTokenizer.

StemmerOverrideFilterFactory
Provides the ability to override any KeywordAttribute aware stemmer with custom dictionary-based stemming.

StempelPolishStemFilterFactory in lucene-analyzers-stempel-5.2.1.jar ( contrib/analysis-extras/lucene-libs/ )
Transforms the token stream as per the stemming algorithm.

StopFilterFactory
Removes stop words from a token stream.

SuggestStopFilterFactory in lucene-suggest-5.2.1.jar ( server/solr-webapp/webapp/WEB-INF/lib/ )
Like StopFilter except it will not remove the last token if that token was not followed by some token separator.

SwedishLightStemFilterFactory
A TokenFilter that applies SwedishLightStemmer to stem Swedish words.

SynonymFilterFactory
Matches single or multi word synonyms in a token stream.

ThaiWordFilterFactory
TokenFilter that use java.text.BreakIterator to break each Token that is Thai into separate Token(s) for each Thai word.

TokenOffsetPayloadTokenFilterFactory
Adds the OffsetAttribute#startOffset() and OffsetAttribute#endOffset() First 4 bytes are the start

TrimFilterFactory
Trims leading and trailing whitespace from Tokens in the stream.

TruncateTokenFilterFactory
A token filter for truncating the terms into a specific length.

TurkishLowerCaseFilterFactory (multi)
Normalizes Turkish token text to lower case.

TypeAsPayloadTokenFilterFactory
Makes the org.apache.lucene.analysis.Token#type() a payload.

TypeTokenFilterFactory
Factory class for TypeTokenFilter.

UpperCaseFilterFactory (multi)
Normalizes token text to UPPER CASE.

WordDelimiterFilterFactory
Splits words into subwords and performs optional transformations on subword groups.

Analyzer chain types

In Solr, the text is analyzed twice: once when it gets indexed and once it gets queried (searched).

It's possible to define the same chain for both of these phases

<fieldType name="text_fa" class="solr.TextField" positionIncrementGap="100">
  <analyzer>
	<charFilter class="solr.PersianCharFilterFactory"/>
	<tokenizer class="solr.StandardTokenizerFactory"/>
	<filter class="solr.LowerCaseFilterFactory"/>
	<filter class="solr.ArabicNormalizationFilterFactory"/>
	<filter class="solr.PersianNormalizationFilterFactory"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fa.txt" />
  </analyzer>
</fieldType>

Alternatively, the analyzis and query chains can be different

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
	<tokenizer class="solr.StandardTokenizerFactory"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
	<filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
	<tokenizer class="solr.StandardTokenizerFactory"/>
	<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
	<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
	<filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Finally, there is a third - usually hidden - chain type, which is used for multiterm analysis (queries like term* and [term1..term2]). The reason it is hidden is because it is usually automatically constructed from the explicitly defined chain by only using components that are mutiterm-aware. They are marked with (multi) in the list above. The primary use case is to ensure that case-insensitive matches work as expected even when wildcards are used. You can read more complete explanation in the Solr Wiki.

To use it, add <analyzer type="multiterm"> section next to the index and query sections in the analyzer chain definition.

Short Names

Notice that most of Analyzer, Tokenizer and Filter factories can be referenced by shortname such as:
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
Only non-core components require full class name, including package name.


Previous versions of this document

You can also find archive versions of this document for version 5.0.0, version 4.10.1, version 4.9.0, version 4.8.0, and version 4.7.0

Subscribe to Solr Start news and updates:

* indicates required