org.apache.lucene.analysis.uima

Class BaseUIMATokenizer

    • Field Detail

      • ae

        protected AnalysisEngine ae
      • cas

        protected CAS cas
    • Method Detail

      • analyzeInput

        protected void analyzeInput()
                             throws ResourceInitializationException,
                                    AnalysisEngineProcessException,
                                    IOException
        analyzes the tokenizer input using the given analysis engine

        cas will be filled with extracted metadata (UIMA annotations, feature structures)

        Throws:
        IOException - If there is a low-level I/O error.
        ResourceInitializationException
        AnalysisEngineProcessException
      • initializeIterator

        protected abstract void initializeIterator()
                                            throws IOException
        initialize the FSIterator which is used to build tokens at each incrementToken() method call
        Throws:
        IOException - If there is a low-level I/O error.
      • reset

        public void reset()
                   throws IOException
        Description copied from class: TokenStream
        This method is called by a consumer before it begins consumption using TokenStream.incrementToken().

        Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh.

        If you override this method, always call super.reset(), otherwise some internal state will not be correctly reset (e.g., Tokenizer will throw IllegalStateException on further usage).

        Overrides:
        reset in class Tokenizer
        Throws:
        IOException