Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
925 views
in Technique[技术] by (71.8m points)

performance - Why does the java DirectoryStream perform so slow?

I've done some testing with Streams in special with DirectoryStreams of the nio-package. I simply try to get a list of all files in a directory sorted by last modified date and size.

The JavaDoc of old File.listFiles() stated a Note to the method in Files:

Note that the Files class defines the newDirectoryStream method to open a directory and iterate over the names of the files in the directory. This may use less resources when working with very large directories.

I run the code down below a lot of times (first three times below):

First-run:

Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875

Second-run:

Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937

Third-run:

Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896

My question is: Why do the streams perform so bad?

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

  // This sorts from old to young and from big to small
  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    int sorter = 0;
    try {
      FileTime lm1 = Files.getLastModifiedTime(o1);
      FileTime lm2 = Files.getLastModifiedTime(o2);
      if (lm2.compareTo(lm1) == 0) {
        Long s1 = Files.size(o1);
        Long s2 = Files.size(o2);
        sorter = s2.compareTo(s1);
      } else {
        sorter = lm1.compareTo(lm2);
      }
    } catch (IOException ex) {
      throw new UncheckedIOException(ex);
    }
    return sorter;
  };

  public String[] getSortedFileListAsArray(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).toArray(String[]::new);
  }

  public List<String> getSortedFileListAsList(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).collect(Collectors.
            toList());
  }

  public String[] sortByDateAndSize(File[] fileList) {
    Arrays.sort(fileList, (File o1, File o2) -> {
      int r = Long.compare(o1.lastModified(), o2.lastModified());
      if (r != 0) {
        return r;
      }
      return Long.compare(o1.length(), o2.length());
    });
    String[] fileNames = new String[fileList.length];
    for (int i = 0; i < fileNames.length; i++) {
      fileNames[i] = fileList[i].getName();
    }
    return fileNames;
  }

  public static void main(String[] args) throws IOException {
    // File (io package)
    File f = new File("C:\Windows\system32");
    // Path (nio package)
    Path dir = Paths.get("C:\Windows\system32");

    FileSorter fs = new FileSorter();

    long before = System.currentTimeMillis();
    String[] names = fs.sortByDateAndSize(f.listFiles());
    long after = System.currentTimeMillis();
    System.out.println("Run time of Arrays.sort: " + ((after - before)));

    long before2 = System.currentTimeMillis();
    String[] names2 = fs.getSortedFileListAsArray(dir);
    long after2 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

    long before3 = System.currentTimeMillis();
    List<String> names3 = fs.getSortedFileListAsList(dir);
    long after3 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as List: " + ((after3 - before3)));
  }
}

Update

After applying the code from Peter I have this results:

Run time of Arrays.sort: 1615
Run time of Stream.sorted as Array: 3116
Run time of Stream.sorted as List: 3059
Run time of Stream.sorted as List with caching: 378

Update 2

After doing some research on the solution of Peter, I can say, that reading file attributes with for ex. Files.getLastModified must be a heavy crunch. Changing only that part in Comparator to:

  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    File f1 = o1.toFile();
    File f2 = o2.toFile();
    long lm1 = f1.lastModified();
    long lm2 = f2.lastModified();
    int cmp = Long.compare(lm2, lm1);
    if (cmp == 0) {
      cmp = Long.compare(f2.length(), f1.length());
    }
    return cmp;
  };

Gets the even better result on my computer:

Run time of Arrays.sort: 1968
Run time of Stream.sorted as Array: 1999
Run time of Stream.sorted as List: 1975
Run time of Stream.sorted as List with caching: 488

But as you can see, caching the object is the much best way. And as jtahlborn mentioned, it is a kind of stable sort.

Update 3 (best solution I've found)

After a bit more research, I've seen, that the methods Files.lastModified and Files.size, both do a huge job on a same thing: Attributes. So I made three versions of the PathInfo class to test:

  1. Peters version as described down below
  2. An old style File version, where I do a Path.toFile() once in the constructor and get all values from that file with f.lastModified and f.length
  3. An version of Peters solution, but now I read an Attribute object with Files.readAttributes(path,BasicFileAttributes.class) and done things on this.

Putting it all in a loop for doing it 100 times each, I came up with these results:

After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66

Code in constructor of PathInfo for the best solution:

public PathInfo(Path path) {
  try {
    // read the whole attributes once
    BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
    fileName = path.getFileName().toString();
    modified = bfa.lastModifiedTime().toMillis();
    size = bfa.size();
  } catch (IOException ex) {
    throw new UncheckedIOException(ex);
  }
}

My result: Never read attributes twice and caching in an Object is bursting performance.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Files.list() is a O(N) operation whereas sorting is O(N log N). It is far more likely that the operations inside the sorting which matter. Given the comparisons don't do the same thing, this is the most likely explanation. There is a lot of files with the same modification date under C:/Windows/System32 meaning the size would be checked quite often.

To show that most of the time is not spent in FIles.list(dir) Stream, I have optimise the comparison so the data about a file is only obtained once per file.

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

    // This sorts from old to young and from big to small
    Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
        int sorter = 0;
        try {
            FileTime lm1 = Files.getLastModifiedTime(o1);
            FileTime lm2 = Files.getLastModifiedTime(o2);
            if (lm2.compareTo(lm1) == 0) {
                Long s1 = Files.size(o1);
                Long s2 = Files.size(o2);
                sorter = s2.compareTo(s1);
            } else {
                sorter = lm1.compareTo(lm2);
            }
        } catch (IOException ex) {
            throw new UncheckedIOException(ex);
        }
        return sorter;
    };

    public String[] getSortedFileListAsArray(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).toArray(String[]::new);
    }

    public List<String> getSortedFileListAsList(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).collect(Collectors.
                toList());
    }

    public String[] sortByDateAndSize(File[] fileList) {
        Arrays.sort(fileList, (File o1, File o2) -> {
            int r = Long.compare(o1.lastModified(), o2.lastModified());
            if (r != 0) {
                return r;
            }
            return Long.compare(o1.length(), o2.length());
        });
        String[] fileNames = new String[fileList.length];
        for (int i = 0; i < fileNames.length; i++) {
            fileNames[i] = fileList[i].getName();
        }
        return fileNames;
    }

    public List<String> getSortedFile(Path dir) throws IOException {
        return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
    }

    static class PathInfo implements Comparable<PathInfo> {
        private final String fileName;
        private final long modified;
        private final long size;

        public PathInfo(Path path) {
            try {
                fileName = path.getFileName().toString();
                modified = Files.getLastModifiedTime(path).toMillis();
                size = Files.size(path);
            } catch (IOException ex) {
                throw new UncheckedIOException(ex);
            }
        }

        @Override
        public int compareTo(PathInfo o) {
            int cmp = Long.compare(modified, o.modified);
            if (cmp == 0)
                cmp = Long.compare(size, o.size);
            return cmp;
        }

        public String getFileName() {
            return fileName;
        }
    }

    public static void main(String[] args) throws IOException {
        // File (io package)
        File f = new File("C:\Windows\system32");
        // Path (nio package)
        Path dir = Paths.get("C:\Windows\system32");

        FileSorter fs = new FileSorter();

        long before = System.currentTimeMillis();
        String[] names = fs.sortByDateAndSize(f.listFiles());
        long after = System.currentTimeMillis();
        System.out.println("Run time of Arrays.sort: " + ((after - before)));

        long before2 = System.currentTimeMillis();
        String[] names2 = fs.getSortedFileListAsArray(dir);
        long after2 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

        long before3 = System.currentTimeMillis();
        List<String> names3 = fs.getSortedFileListAsList(dir);
        long after3 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
        long before4 = System.currentTimeMillis();
        List<String> names4 = fs.getSortedFile(dir);
        long after4 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
    }
}

This prints on my laptop.

Run time of Arrays.sort: 1980
Run time of Stream.sorted as Array: 1295
Run time of Stream.sorted as List: 1228
Run time of Stream.sorted as List with caching: 185

As you can see, about 85% of the time is spent obtaining the modification date and size of the files repeatedly.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...