org.apache.lucene.analysis.miscellaneous

Class CapitalizationFilterFactory



  • public class CapitalizationFilterFactory
    extends TokenFilterFactory
    Factory for CapitalizationFilter.

    The factory takes parameters:

    • "onlyFirstWord" - should each word be capitalized or all of the words?
    • "keep" - a keep word list. Each word that should be kept separated by whitespace.
    • "keepIgnoreCase - true or false. If true, the keep list will be considered case-insensitive.
    • "forceFirstLetter" - Force the first letter to be capitalized even if it is in the keep list
    • "okPrefix" - do not change word capitalization if a word begins with something in this list. for example if "McK" is on the okPrefix list, the word "McKinley" should not be changed to "Mckinley"
    • "minWordLength" - how long the word needs to be to get capitalization applied. If the minWordLength is 3, "and" > "And" but "or" stays "or"
    • "maxWordCount" - if the token contains more then maxWordCount words, the capitalization is assumed to be correct.
     <fieldType name="text_cptlztn" class="solr.TextField" positionIncrementGap="100">
       <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.CapitalizationFilterFactory" onlyFirstWord="true"
               keep="java solr lucene" keepIgnoreCase="false"
               okPrefix="McK McD McA"/>   
       </analyzer>
     </fieldType>
    Since:
    solr 1.3