Class DocTermOrds

  • All Implemented Interfaces:

    This will be removed in Lucene 7.0.

    public class DocTermOrds
    extends Object
    implements Accountable
    This class enables fast access to multiple term ords for a specified field across all docIDs. Like FieldCache, it uninverts the index and holds a packed data structure in RAM to enable fast access. Unlike FieldCache, it can handle multi-valued fields, and, it does not hold the term bytes in RAM. Rather, you must obtain a TermsEnum from the getOrdTermsEnum(org.apache.lucene.index.LeafReader) method, and then seek-by-ord to get the term's bytes. While normally term ords are type long, in this API they are int as the internal representation here cannot address more than MAX_INT unique terms. Also, typically this class is used on fields with relatively few unique terms vs the number of documents. In addition, there is an internal limit (16 MB) on how many bytes each chunk of documents may consume. If you trip this limit you'll hit an IllegalStateException. Deleted documents are skipped during uninversion, and if you look them up you'll get 0 ords. The returned per-document ords do not retain their original order in the document. Instead they are returned in sorted (by ord, ie term's BytesRef comparator) order. They are also de-dup'd (ie if doc has same term more than once in this field, you'll only get that ord back once). This class will create its own term index internally, allowing to create a wrapped TermsEnum that can handle ord. The getOrdTermsEnum(org.apache.lucene.index.LeafReader) method then provides this wrapped enum. The RAM consumption of this class can be high!