org.apache.lucene.search.highlight

Class Highlighter

    • Method Detail

      • getBestFragment

        public final String getBestFragment(TokenStream tokenStream,
                                            String text)
                                     throws IOException,
                                            InvalidTokenOffsetsException
        Highlights chosen terms in a text, extracting the most relevant section. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragment with the highest score is returned
        Parameters:
        tokenStream - a stream of tokens identified in the text parameter, including offset information. This is typically produced by an analyzer re-parsing a document's text. Some work may be done on retrieving TokenStreams more efficiently by adding support for storing original text position data in the Lucene index but this support is not currently available (as of Lucene 1.4 rc2).
        text - text to highlight terms in
        Returns:
        highlighted text fragment or null if no terms found
        Throws:
        InvalidTokenOffsetsException - thrown if any token's endOffset exceeds the provided text's length
        IOException
      • getBestFragments

        public final String[] getBestFragments(TokenStream tokenStream,
                                               String text,
                                               int maxNumFragments)
                                        throws IOException,
                                               InvalidTokenOffsetsException
        Highlights chosen terms in a text, extracting the most relevant sections. The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned as an array of strings in order of score (contiguous fragments are merged into one in their original order to improve readability)
        Parameters:
        text - text to highlight terms in
        maxNumFragments - the maximum number of fragments.
        Returns:
        highlighted text fragments (between 0 and maxNumFragments number of fragments)
        Throws:
        InvalidTokenOffsetsException - thrown if any token's endOffset exceeds the provided text's length
        IOException
      • getBestFragments

        public final String getBestFragments(TokenStream tokenStream,
                                             String text,
                                             int maxNumFragments,
                                             String separator)
                                      throws IOException,
                                             InvalidTokenOffsetsException
        Highlights terms in the text , extracting the most relevant sections and concatenating the chosen fragments with a separator (typically "..."). The document text is analysed in chunks to record hit statistics across the document. After accumulating stats, the fragments with the highest scores are returned in order as "separator" delimited strings.
        Parameters:
        text - text to highlight terms in
        maxNumFragments - the maximum number of fragments.
        separator - the separator used to intersperse the document fragments (typically "...")
        Returns:
        highlighted text
        Throws:
        InvalidTokenOffsetsException - thrown if any token's endOffset exceeds the provided text's length
        IOException