Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
252 views
in Technique[技术] by (71.8m points)

java - Zipping a huge folder by using a ZipFileSystem results in OutOfMemoryError

The java.nio package has a beautiful way of handling zip files by treating them as file systems. This enables us to treat zip file contents like usual files. Thus, zipping a whole folder can be achieved by simply using Files.copy to copy all the files into the zip file. Since subfolders are to be copied as well, we need a visitor:

 private static class CopyFileVisitor extends SimpleFileVisitor<Path> {
    private final Path targetPath;
    private Path sourcePath = null;
    public CopyFileVisitor(Path targetPath) {
        this.targetPath = targetPath;
    }

    @Override
    public FileVisitResult preVisitDirectory(final Path dir,
    final BasicFileAttributes attrs) throws IOException {
        if (sourcePath == null) {
            sourcePath = dir;
        } else {
        Files.createDirectories(targetPath.resolve(sourcePath
                    .relativize(dir).toString()));
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(final Path file,
    final BasicFileAttributes attrs) throws IOException {
    Files.copy(file,
        targetPath.resolve(sourcePath.relativize(file).toString()), StandardCopyOption.REPLACE_EXISTING);
    return FileVisitResult.CONTINUE;
    }
}

This is a simple "copy directory recursively" visitor. It is used to copy a directory recursively. However, with the ZipFileSystem, we can also use it to copy a directory into a zip file, like this:

public static void zipFolder(Path zipFile, Path sourceDir) throws ZipException, IOException
{
    // Initialize the Zip Filesystem and get its root
    Map<String, String> env = new HashMap<>();
    env.put("create", "true");
    URI uri = URI.create("jar:" + zipFile.toUri());       
    FileSystem fileSystem = FileSystems.newFileSystem(uri, env);
    Iterable<Path> roots = fileSystem.getRootDirectories();
    Path root = roots.iterator().next();

    // Simply copy the directory into the root of the zip file system
    Files.walkFileTree(sourceDir, new CopyFileVisitor(root));
}

This is what I call an elegant way of zipping a whole folder. However, when using this method on a huge folder (around 3 GB) I receive an OutOfMemoryError (heap space). When using a usual zip handling library, this error is not raised. Thus, it seems that the way the ZipFileSystem handles the copy is very inefficient: Too much of the files to be written is kept in memory so the OutOfMemoryError occurs.

Why is this the case? Is using ZipFileSystem generally considered inefficient (in terms of memory consumption) or am I doing something wrong here?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I looked at ZipFileSystem.java and I believe I found the source of the memory consumption. By default, the implementation is using ByteArrayOutputStream as the buffer to compress the files, which means that it's limited by the amount of memory assigned to the JVM.

There's an (undocumented) environment variable we can use to make the implementation use temporary files ("useTempFile"). It works like this:

Map<String, Object> env = new HashMap<>();
env.put("create", "true");
env.put("useTempFile", Boolean.TRUE);

More details here: http://www.docjar.com/html/api/com/sun/nio/zipfs/ZipFileSystem.java.html, interesting lines are 96, 1358 and 1362.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...