Class TikaEntityProcessor

  • public class TikaEntityProcessor
    extends EntityProcessorBase

    An implementation of EntityProcessor which reads data from rich docs using Apache Tika

    To index latitude/longitude data that might be extracted from a file's metadata, identify the geo field for this information with this attribute: spatialMetadataField

    solr 3.1
    • Method Detail

      • init

        public void init(Context context)
        Description copied from class: EntityProcessor
        This method is called when it starts processing an entity. When it comes back to the entity it is called again. So it can reset anything at that point. For a rootmost entity this is called only once for an ingestion. For sub-entities , this is called multiple once for each row from its parent entity
        init in class EntityProcessorBase
        context - The current context
      • firstInit

        protected void firstInit(Context context)
        Description copied from class: EntityProcessorBase
        first time init call. do one-time operations here it's necessary to call it from the overridden method, otherwise it throws NPE on accessing zipper from nextRow()
        firstInit in class EntityProcessorBase
      • nextRow

        public Map<String,ObjectnextRow()
        Description copied from class: EntityProcessorBase
        For a simple implementation, this is the only method that the sub-class should implement. This is intended to stream rows one-by-one. Return null to signal end of rows
        nextRow in class EntityProcessorBase
        a row where the key is the name of the field and value can be any Object or a Collection of objects. Return null to signal end of rows