org.apache.solr.handler.dataimport

Class LineEntityProcessor



  • public class LineEntityProcessor
    extends EntityProcessorBase

    An EntityProcessor instance which can stream lines of text read from a datasource. Options allow lines to be explicitly skipped or included in the index.

    Attribute summary

    • url is the required location of the input file. If this value is relative, it assumed to be relative to baseLoc.
    • acceptLineRegex is an optional attribute that if present discards any line which does not match the regExp.
    • skipLineRegex is an optional attribute that is applied after any acceptLineRegex and discards any line which matches this regExp.

    Although envisioned for reading lines from a file or url, LineEntityProcessor may also be useful for dealing with change lists, where each line contains filenames which can be used by subsequent entities to parse content from those files.

    Refer to http://wiki.apache.org/solr/DataImportHandler for more details.

    This API is experimental and may change in the future.

    Since:
    solr 1.4
    See Also:
    Pattern
    • Field Detail

      • URL

        public static final String URL
        Holds the name of entity attribute that will be parsed to obtain the filename containing the changelist.
        See Also:
        Constant Field Values
      • ACCEPT_LINE_REGEX

        public static final String ACCEPT_LINE_REGEX
        Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be returned.
        See Also:
        Constant Field Values
      • SKIP_LINE_REGEX

        public static final String SKIP_LINE_REGEX
        Holds the name of entity attribute that will be parsed to obtain the pattern to be used when checking to see if a line should be ignored.
        See Also:
        Constant Field Values
    • Method Detail

      • init

        public void init(Context context)
        Parses each of the entity attributes.
        Overrides:
        init in class EntityProcessorBase
        Parameters:
        context - The current context
      • nextRow

        public Map<String,ObjectnextRow()
        Reads lines from the url till it finds a lines that matches the optional acceptLineRegex and does not match the optional skipLineRegex.
        Overrides:
        nextRow in class EntityProcessorBase
        Returns:
        A row containing a minimum of one field "rawLine" or null to signal end of file. The rawLine is the as line as returned by readLine() from the url. However transformers can be used to create as many other fields as required.