org.apache.lucene.index

Class SegmentInfos

  • All Implemented Interfaces:
    Cloneable, Iterable<SegmentCommitInfo>


    public final class SegmentInfos
    extends Object
    implements Cloneable, Iterable<SegmentCommitInfo>
    A collection of segmentInfo objects with methods for operating on those segments in relation to the file system.

    The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or a custom IndexDeletionPolicy is in use). This file lists each segment by name and has details about the codec and generation of deletes.

    Files:

    • segments_N: Header, LuceneVersion, Version, NameCounter, SegCount, MinSegmentLuceneVersion, <SegName, HasSegID, SegID, SegCodec, DelGen, DeletionCount, FieldInfosGen, DocValuesGen, UpdatesFiles>SegCount, CommitUserData, Footer
    Data types:
    • Header --> IndexHeader
    • LuceneVersion --> Which Lucene code Version was used for this commit, written as three vInt: major, minor, bugfix
    • MinSegmentLuceneVersion --> Lucene code Version of the oldest segment, written as three vInt: major, minor, bugfix; this is only written only if there's at least one segment
    • NameCounter, SegCount, DeletionCount --> Int32
    • Generation, Version, DelGen, Checksum, FieldInfosGen, DocValuesGen --> Int64
    • HasSegID --> Int8
    • SegID --> Int8ID_LENGTH
    • SegName, SegCodec --> String
    • CommitUserData --> Map<String,String>
    • UpdatesFiles --> Map<Int32, Set<String>>
    • Footer --> CodecFooter
    Field Descriptions:
    • Version counts how often the index has been changed by adding or deleting documents.
    • NameCounter is used to generate names for new segment files.
    • SegName is the name of the segment, and is used as the file name prefix for all of the files that compose the segment's index.
    • DelGen is the generation count of the deletes file. If this is -1, there are no deletes. Anything above zero means there are deletes stored by LiveDocsFormat.
    • DeletionCount records the number of deleted documents in this segment.
    • SegCodec is the name of the Codec that encoded this segment.
    • HasSegID is nonzero if the segment has an identifier. Otherwise, when it is 0 the identifier is null and no SegID is written. Null only happens for Lucene 4.x segments referenced in commits.
    • SegID is the identifier of the Codec that encoded this segment.
    • CommitUserData stores an optional user-supplied opaque Map<String,String> that was passed to IndexWriter.setLiveCommitData(Iterable).
    • FieldInfosGen is the generation count of the fieldInfos file. If this is -1, there are no updates to the fieldInfos in that segment. Anything above zero means there are updates to fieldInfos stored by FieldInfosFormat .
    • DocValuesGen is the generation count of the updatable DocValues. If this is -1, there are no updates to DocValues in that segment. Anything above zero means there are updates to DocValues stored by DocValuesFormat.
    • UpdatesFiles stores the set of files that were updated in that segment per field.