Java – Reading a Large File Efficiently

Efficiently read text and binary files in Java

“The world is a book and those who do not travel read only one page.”
― Augustine of Hippo

1. Introduction

What’s the most efficient and easiest way to read a large file in java? Well, one way is to read the whole file at once into memory. Let us examine some issues that arise when doing so.

2. Loading Whole File Into Memory

One way to load the whole file into a String is to use NIO. This can be accomplished in a single line as follows:

String str = new String(Files.readAllBytes(Paths.get(pathname)),
                        StandardCharsets.UTF_8);

There are several other ways to read whole file into memory. Check this article for more details, including benchmarks.

The problem with the above approach is that, with a sufficiently large file, you end up with an OutOfMemoryError.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

On my machine with 4G of RAM and 12G of swap, I cannot load a 300MB file successfully using this method. So we need to look at alternative methods of processing a whole file.

3. Loading a Binary File in Chunks

The following code demonstrates how to load and process the bytes in a file (can be a binary file) a chunk at a time.

try(BufferedInputStream in = new BufferedInputStream(new FileInputStream(pathname))) {
    byte[] bbuf = new byte[4096];
    int len;
    while ((len = in.read(bbuf)) != -1) {
        // process data here: bbuf[0] thru bbuf[len - 1]
    }
}

4. Reading a Text File Line By Line

Processing a text file is easier when you need to do it line by line. There are several methods for doing so. Here is one method using a BufferedReader:

try(BufferedReader in = new BufferedReader(new FileReader(pathname))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here.
    }
}

5. Using a Scanner

The Scanner class provides another convenient way to read a file line by line, using the hasNextLine() and nextLine() methods.

try(Scanner scanner = new Scanner(new File(pathname))) {
    while ( scanner.hasNextLine() ) {
        String line = scanner.nextLine();
        // process line here.
    }
}

If you need to read line-by-line, I recommend the method above using BufferedReader since the Scanner method is slow as molasses.

6. With Java 8 Streams

Java 8 provides the streams facility which are useful in wide variety of cases. Here we can use the Files.lines() method to create a stream of lines from a file, apply any filters and do any processing we want. In the following example, we are selecting lines that contain the string abc and collect the results into a List.

List<String> alist = Files.lines(Paths.get(pathname))
    .filter(line -> line.contains("abc"))
    .collect(Collectors.toList());

Review

We discussed some methods for loading and processing files efficiently. First off, you could just load the whole file into memory if the file is small enough. For large files, you need to process chunks. A binary file can be processed in chunks of say, 4kB. A text file can be processed line by line.