org.apache.lucene.codecs

Class CodecUtil



  • public final class CodecUtil
    extends Object
    Utility class for reading and writing versioned headers.

    Writing codec headers is useful to ensure that a file is in the format you think it is.

    • Method Detail

      • writeHeader

        public static void writeHeader(DataOutput out,
                                       String codec,
                                       int version)
                                throws IOException
        Writes a codec header, which records both a string to identify the file and a version number. This header can be parsed and validated with checkHeader().

        CodecHeader --> Magic,CodecName,Version

        • Magic --> Uint32. This identifies the start of the header. It is always 1071082519.
        • CodecName --> String. This is a string to identify this file.
        • Version --> Uint32. Records the version of the file.

        Note that the length of a codec header depends only upon the name of the codec, so this length can be computed at any time with headerLength(String).

        Parameters:
        out - Output stream
        codec - String to identify this file. It should be simple ASCII, less than 128 characters in length.
        version - Version number
        Throws:
        IOException - If there is an I/O error writing to the underlying medium.
        IllegalArgumentException - If the codec name is not simple ASCII, or is more than 127 characters in length
      • writeIndexHeader

        public static void writeIndexHeader(DataOutput out,
                                            String codec,
                                            int version,
                                            byte[] id,
                                            String suffix)
                                     throws IOException
        Writes a codec header for an index file, which records both a string to identify the format of the file, a version number, and data to identify the file instance (ID and auxiliary suffix such as generation).

        This header can be parsed and validated with checkIndexHeader().

        IndexHeader --> CodecHeader,ObjectID,ObjectSuffix

        Note that the length of an index header depends only upon the name of the codec and suffix, so this length can be computed at any time with indexHeaderLength(String,String).

        Parameters:
        out - Output stream
        codec - String to identify the format of this file. It should be simple ASCII, less than 128 characters in length.
        id - Unique identifier for this particular file instance.
        suffix - auxiliary suffix information for the file. It should be simple ASCII, less than 256 characters in length.
        version - Version number
        Throws:
        IOException - If there is an I/O error writing to the underlying medium.
        IllegalArgumentException - If the codec name is not simple ASCII, or is more than 127 characters in length, or if id is invalid, or if the suffix is not simple ASCII, or more than 255 characters in length.
      • checkHeader

        public static int checkHeader(DataInput in,
                                      String codec,
                                      int minVersion,
                                      int maxVersion)
                               throws IOException
        Reads and validates a header previously written with writeHeader(DataOutput, String, int).

        When reading a file, supply the expected codec and an expected version range (minVersion to maxVersion).

        Parameters:
        in - Input stream, positioned at the point where the header was previously written. Typically this is located at the beginning of the file.
        codec - The expected codec name.
        minVersion - The minimum supported expected version number.
        maxVersion - The maximum supported expected version number.
        Returns:
        The actual version found, when a valid header is found that matches codec, with an actual version where minVersion <= actual <= maxVersion. Otherwise an exception is thrown.
        Throws:
        CorruptIndexException - If the first four bytes are not CODEC_MAGIC, or if the actual codec found is not codec.
        IndexFormatTooOldException - If the actual version is less than minVersion.
        IndexFormatTooNewException - If the actual version is greater than maxVersion.
        IOException - If there is an I/O error reading from the underlying medium.
        See Also:
        writeHeader(DataOutput, String, int)
      • checkIndexHeader

        public static int checkIndexHeader(DataInput in,
                                           String codec,
                                           int minVersion,
                                           int maxVersion,
                                           byte[] expectedID,
                                           String expectedSuffix)
                                    throws IOException
        Reads and validates a header previously written with writeIndexHeader(DataOutput, String, int, byte[], String).

        When reading a file, supply the expected codec, expected version range (minVersion to maxVersion), and object ID and suffix.

        Parameters:
        in - Input stream, positioned at the point where the header was previously written. Typically this is located at the beginning of the file.
        codec - The expected codec name.
        minVersion - The minimum supported expected version number.
        maxVersion - The maximum supported expected version number.
        expectedID - The expected object identifier for this file.
        expectedSuffix - The expected auxiliary suffix for this file.
        Returns:
        The actual version found, when a valid header is found that matches codec, with an actual version where minVersion <= actual <= maxVersion, and matching expectedID and expectedSuffix Otherwise an exception is thrown.
        Throws:
        CorruptIndexException - If the first four bytes are not CODEC_MAGIC, or if the actual codec found is not codec, or if the expectedID or expectedSuffix do not match.
        IndexFormatTooOldException - If the actual version is less than minVersion.
        IndexFormatTooNewException - If the actual version is greater than maxVersion.
        IOException - If there is an I/O error reading from the underlying medium.
        See Also:
        writeIndexHeader(DataOutput, String, int, byte[],String)
      • verifyAndCopyIndexHeader

        public static void verifyAndCopyIndexHeader(IndexInput in,
                                                    DataOutput out,
                                                    byte[] expectedID)
                                             throws IOException
        Expert: verifies the incoming IndexInput has an index header and that its segment ID matches the expected one, and then copies that index header into the provided DataOutput. This is useful when building compound files.
        Parameters:
        in - Input stream, positioned at the point where the index header was previously written. Typically this is located at the beginning of the file.
        out - Output stream, where the header will be copied to.
        expectedID - Expected segment ID
        Throws:
        CorruptIndexException - If the first four bytes are not CODEC_MAGIC, or if the expectedID does not match.
        IOException - If there is an I/O error reading from the underlying medium.
      • writeFooter

        public static void writeFooter(IndexOutput out)
                                throws IOException
        Writes a codec footer, which records both a checksum algorithm ID and a checksum. This footer can be parsed and validated with checkFooter().

        CodecFooter --> Magic,AlgorithmID,Checksum

        • Magic --> Uint32. This identifies the start of the footer. It is always -1071082520.
        • AlgorithmID --> Uint32. This indicates the checksum algorithm used. Currently this is always 0, for zlib-crc32.
        • Checksum --> Uint64. The actual checksum value for all previous bytes in the stream, including the bytes from Magic and AlgorithmID.
        Parameters:
        out - Output stream
        Throws:
        IOException - If there is an I/O error writing to the underlying medium.
      • checkFooter

        public static void checkFooter(ChecksumIndexInput in,
                                       Throwable priorException)
                                throws IOException
        Validates the codec footer previously written by writeFooter(org.apache.lucene.store.IndexOutput), optionally passing an unexpected exception that has already occurred.

        When a priorException is provided, this method will add a suppressed exception indicating whether the checksum for the stream passes, fails, or cannot be computed, and rethrow it. Otherwise it behaves the same as checkFooter(ChecksumIndexInput).

        Example usage:

         try (ChecksumIndexInput input = ...) {
           Throwable priorE = null;
           try {
             // ... read a bunch of stuff ... 
           } catch (Throwable exception) {
             priorE = exception;
           } finally {
             CodecUtil.checkFooter(input, priorE);
           }
         }
         
        Throws:
        IOException