org.apache.lucene.store

Class MMapDirectory

  • All Implemented Interfaces:
    Closeable, AutoCloseable


    public class MMapDirectory
    extends FSDirectory
    File-based Directory implementation that uses mmap for reading, and FSDirectory.FSIndexOutput for writing.

    NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit JRE, or a 32 bit JRE with indexes that are guaranteed to fit within the address space. On 32 bit platforms also consult MMapDirectory(Path, LockFactory, int) if you have problems with mmap failing because of fragmented address space. If you get an OutOfMemoryException, it is recommended to reduce the chunk size, until it works.

    Due to this bug in Sun's JRE, MMapDirectory's IndexInput.close() is unable to close the underlying OS file handle. Only when GC finally collects the underlying objects, which could be quite some time later, will the file handle be closed.

    This will consume additional transient disk usage: on Windows, attempts to delete or overwrite the files will result in an exception; on other platforms, which typically have a "delete on last close" semantics, while such operations will succeed, the bytes are still consuming space on disk. For many applications this limitation is not a problem (e.g. if you have plenty of disk space, and you don't rely on overwriting files on Windows) but it's still an important limitation to be aware of.

    This class supplies the workaround mentioned in the bug report (see setUseUnmap(boolean)), which may fail on non-Oracle/OpenJDK JVMs. It forcefully unmaps the buffer on close by using an undocumented internal cleanup functionality. If UNMAP_SUPPORTED is true, the workaround will be automatically enabled (with no guarantees; if you discover any problems, you can disable it).

    NOTE: Accessing this class either directly or indirectly from a thread while it's interrupted can close the underlying channel immediately if at the same time the thread is blocked on IO. The channel will remain closed and subsequent access to MMapDirectory will throw a ClosedChannelException. If your application uses either Thread.interrupt() or Future.cancel(boolean) you should use the legacy RAFDirectory from the Lucene misc module in favor of MMapDirectory.

    See Also:
    Blog post about MMapDirectory
    • Constructor Detail

      • MMapDirectory

        public MMapDirectory(Path path,
                             LockFactory lockFactory)
                      throws IOException
        Create a new MMapDirectory for the named location. The directory is created at the named location if it does not yet exist.
        Parameters:
        path - the path of the directory
        lockFactory - the lock factory to use
        Throws:
        IOException - if there is a low-level I/O error
      • MMapDirectory

        public MMapDirectory(Path path,
                             int maxChunkSize)
                      throws IOException
        Create a new MMapDirectory for the named location and FSLockFactory.getDefault(). The directory is created at the named location if it does not yet exist.
        Parameters:
        path - the path of the directory
        maxChunkSize - maximum chunk size (default is 1 GiBytes for 64 bit JVMs and 256 MiBytes for 32 bit JVMs) used for memory mapping.
        Throws:
        IOException - if there is a low-level I/O error
      • MMapDirectory

        public MMapDirectory(Path path,
                             LockFactory lockFactory,
                             int maxChunkSize)
                      throws IOException
        Create a new MMapDirectory for the named location, specifying the maximum chunk size used for memory mapping. The directory is created at the named location if it does not yet exist.

        Especially on 32 bit platform, the address space can be very fragmented, so large index files cannot be mapped. Using a lower chunk size makes the directory implementation a little bit slower (as the correct chunk may be resolved on lots of seeks) but the chance is higher that mmap does not fail. On 64 bit Java platforms, this parameter should always be 1 << 30, as the address space is big enough.

        Please note: The chunk size is always rounded down to a power of 2.

        Parameters:
        path - the path of the directory
        lockFactory - the lock factory to use, or null for the default (NativeFSLockFactory);
        maxChunkSize - maximum chunk size (default is 1 GiBytes for 64 bit JVMs and 256 MiBytes for 32 bit JVMs) used for memory mapping.
        Throws:
        IOException - if there is a low-level I/O error
    • Method Detail

      • setUseUnmap

        public void setUseUnmap(boolean useUnmapHack)
        This method enables the workaround for unmapping the buffers from address space after closing IndexInput, that is mentioned in the bug report. This hack may fail on non-Oracle/OpenJDK JVMs. It forcefully unmaps the buffer on close by using an undocumented internal cleanup functionality.

        NOTE: Enabling this is completely unsupported by Java and may lead to JVM crashes if IndexInput is closed while another thread is still accessing it (SIGSEGV).

        To enable the hack, the following requirements need to be fulfilled: The used JVM must be Oracle Java / OpenJDK 8 (preliminary support for Java 9 EA build 150+ was added with Lucene 6.4). In addition, the following permissions need to be granted to lucene-core.jar in your policy file:

        • permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
        • permission java.lang.RuntimePermission "accessClassInPackage.sun.misc";
        Throws:
        IllegalArgumentException - if UNMAP_SUPPORTED is false and the workaround cannot be enabled. The exception message also contains an explanation why the hack cannot be enabled (e.g., missing permissions).
      • setPreload

        public void setPreload(boolean preload)
        Set to true to ask mapped pages to be loaded into physical memory on init. The behavior is best-effort and operating system dependent.
        See Also:
        MappedByteBuffer.load()