From content to search: speed-dating Apache Solr (ApacheCON 2018)

Welcome to the collection of resources to make Apache Solr search engine more comprehensible to beginner and intermediate users. While Solr is very easy to start with, tuning it is - like for any search engine - fairly complex. This website will try to make this simpler by compiling information and creating tools to accelerate learning Solr. The currently available resources are linked in the menubar above. More resources will be coming shortly.


There are three types of resources on the website currently. All are created semi-automatically from Solr source and distribution. So, they are more complete then manually compiled lists.

Analyzers, Tokenizers, and Filters

Any processing of content in Solr is done through fields and their associated field types. Field type definitions consist of analyser chains. Those chains can contain standalone analyzers. More commonly, however, they contain an optional sequence Character Filters followed by a compulsory Tokenizer, optionally followed by a sequence of Token Filters. There could be separate definition for index and query chains. Together with copyField instructions, this gives Solr ultimate flexibility on how text is processed and searched.

Update Request Processors

Analyzer chains process the fields as they are getting indexed and searched. However, sometimes there is a need to process the document submitted before it hits the indexing process. This allows to do things like automatically adding ID fields, implementing schemaless mode or counting number of values in a multi-valued field to speed up searches. This is done using custom Update Request Processor chains that are configured in solrconfig.xml. One URP chain can contain many individual URP Factories, allowing Solr to pre-process documents uniformly even if they originate from different clients outside Solr.

Searchable Lucene and Solr Javadocs

Lucene and Solr Javadocs are available online. However, they are split into many packages, which often makes looking up a class somewhat difficult. And, sometimes, valuable configuration information is only avialable on the component's Javadoc page. To make this easier, this website provides combined Lucene and Solr Javadocs, that is searchable using Solr-backed autocomplete. It also uses alternative page layout (iframes, instead of usual frames) to allow you bookmarking individual classes more easily. The Analyzer and URP pages cross-link to the relevant Javadoc entries for all their components. In its turn, Javadoc cross-links to the source files for the listed version on the official Lucene/Solr Github repository. This way, it is possible to go from looking at a list of components, to checking individual component's detailed documentations, to reviewing its source just in a couple of clicks.

Recent changes

November 2018
Removed pop-up survey and twitter embeds (annoying and slow)
Updated to more recent presentation on the home page
March 2017
Moved to new static site generator
Deleted archive version of information, as nobody looked at them
Deleted least useful pages for now
Updated home page to provide a bit more info on the website content
February 2017
Updated all lists and Javadoc to Solr 6.4.0