How do I Create a Java String from the Contents of a File?

1. Introduction

Here we present a few ways to read the whole contents of a file into a String in Java.

2. Using java.nio.file in Java 7+

Quite simple to load the whole file into a String using the java.nio.file package:

String str = new String(Files.readAllBytes(Paths.get(pathname)),
                        StandardCharsets.UTF_8);

Here is how the code works. Read all the bytes into a byte buffer using the Files and Paths available in the java.nio.file package.

byte[] buf = Files.readAllBytes(Paths.get(pathname));

Convert the byte buffer into a String by specifying the character set.

String str = new String(buf, StandardCharsets.UTF_8);

3. Scan for end-of-input

Another way to read the whole file into a String is to use the Scanner class. Create a scanner with the file as input, set the appropriate delimiter and read the next token.

Note: the actual delimiter used in the code below is the beginning-of-input marker which will not match anywhere other than the beginning of input.

Scanner scanner = null;
try {
    scanner = new Scanner(new File(pathname), "UTF-8");
    return scanner.useDelimiter("\\A").next();
} finally {
    if ( scanner != null ) scanner.close();
}

4. Memory Mapped File Reading

This method maps the file contents directly into memory using the MappedByteBuffer class. Memory mapping the contents directly might lead one to expect enhanced performance. However this advantage is only available if the buffer is used directly. In our case, since we are creating a String from the contents of the file, the speed advantage of the memory mapped buffers is probably not visible.

static private String readFile3(String pathname)
    throws java.io.IOException
{
    File f = new File(pathname);
    RandomAccessFile file = new RandomAccessFile(pathname, "r");
    MappedByteBuffer buffer = file.getChannel().map(MapMode.READ_ONLY,
						    0,
						    f.length());
    file.close();
    return new StringBuilder(StandardCharsets.UTF_8.decode(buffer))
	.toString();
}

ByteBuffer provides a method asCharBuffer() which returns a “view” of the byte buffer as a character buffer. However, there is no way to specify the encoding for converting bytes to characters with this method — probably an oversight in the Java API. The correct way to convert a ByteBuffer to CharBuffer is to use CharSet.decode(ByteBuffer) with the appropriate CharSet instance.

5. Simple Way Using java.io

Of course, there is always the “old” way (pre-Java 1.7) of reading a whole file into a String: reading the characters in a loop and appending to a buffer.

static private String readFile3(String pathname)
    throws java.io.IOException
{
    FileReader in = null;
    try {
	in = new FileReader(pathname);
	char[] buf = new char[2048];
	int len;
	StringBuilder sbuf = new StringBuilder();
	while ((len = in.read(buf, 0, buf.length)) != -1) {
	    sbuf.append(buf, 0, len);
	}
	return sbuf.toString();
    } finally {
	if ( in != null ) in.close();
    }
}

6. Benchmarking Various Approaches

Since we have several methods of reading a whole file into a string, it is interesting to see how the methods stack up against one another in performance. To this end, we implemented a simple benchmarking method using System.currentTimeMillis(). The following is the average time for each run over 1000 runs of the method.

simple       235 ms for 1000 iters: 0.235000 ms/op
nio          213 ms for 1000 iters: 0.213000 ms/op
scanner      629 ms for 1000 iters: 0.629000 ms/op
mmap         285 ms for 1000 iters: 0.285000 ms/op

For the set of conditions under which the application ran, we can conclude that the NIO method is the fastest followed by the Simple method. Slowest is the Scanner method which is somewhat expected since a regular expression search is involved. A note of warning: do not use these benchmark numbers to pick the method to use. Rather use the method closest to your paradigm of problem solving.

Conclusion

You are now aware of various methods of reading the whole contents of a file into a String. Pick whatever suits you best and use it!

Leave a Reply

Your email address will not be published. Required fields are marked *