org.apache.lucene.util

Class BytesRefHash



  • public final class BytesRefHash
    extends Object
    BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

    Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

    • Method Detail

      • get

        public BytesRef get(int bytesID,
                            BytesRef ref)
        Populates and returns a BytesRef with the bytes for the given bytesID.

        Note: the given bytesID must be a positive integer less than the current size (size())

        Parameters:
        bytesID - the id
        ref - the BytesRef to populate
        Returns:
        the given BytesRef instance populated with the bytes for the given bytesID
      • sort

        public int[] sort()
        Returns the values array sorted by the referenced byte values.

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      • clear

        public void clear()
      • close

        public void close()
        Closes the BytesRefHash and releases all internally used memory
      • find

        public int find(BytesRef bytes)
        Returns the id of the given BytesRef.
        Parameters:
        bytes - the bytes to look for
        Returns:
        the id of the given bytes, or -1 if there is no mapping for the given bytes.
      • addByPoolOffset

        public int addByPoolOffset(int offset)
        Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
      • reinit

        public void reinit()
        reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.
      • byteStart

        public int byteStart(int bytesID)
        Returns the bytesStart offset into the internally used ByteBlockPool for the given bytesID
        Parameters:
        bytesID - the id to look up
        Returns:
        the bytesStart offset into the internally used ByteBlockPool for the given id