Java – Reading a Large File Efficiently

Efficiently read text and binary files in Java

“The world is a book and those who do not travel read only one page.”
― Augustine of Hippo

1. Introduction

What’s the most efficient and easiest way to read a large file in java? Well, one way is to read the whole file at once into memory. Let us examine some issues that arise when doing so.

Continue reading “Java – Reading a Large File Efficiently”

Convert Java 8 Streams to Collections

Process data using Java 8 Streams and store in Collections including List, Set or Map.

“If we knew what it was we were doing, it would not be called research, would it?”
― Albert Einstein

1. Introduction

Java 8 Streams provide a cool facility to combine several operations in a single statement and express it concisely. Once the data is processed, you can collect the data in a List (or another collection). In this article, we present several methods to collect the results of these stream operations into collections.

Continue reading “Convert Java 8 Streams to Collections”

Using Java Collectors

Use Java 8 Streams and Collectors to slice and dice lists including computing sums, averages, partitioning and more.

1. Introduction

Java 8 provides the new Streams facility which makes many Collection operations easy. Streaming items from a collector and filtering the data are trivial. As well as are sorting, searching and computing aggregates. That is, if you are familiar with the many Collectors functions available. We present some of these functions here.

Continue reading “Using Java Collectors”

Java HashMap Examples

1. Introduction

A HashMap is a map of keys to values which uses a hash table for implementation. The HashMap organizes the keys into buckets based on the value of the hashCode() of the key. It provides constant-time performance for get and put operations. What this means is that the time taken by get and put operations does not depend on the size of the map.

Continue reading “Java HashMap Examples”

Java – Read File Line by Line Using Java 8 Streams

1. Introduction

Java 8 Streams provide a cool facility to apply functional-style operations on Collection-based classes such as List and Map. These functional-style operations are very expressive and allow elimination of much boiler-plate code for processing pipelines which operate on  elements in these collections.

2. A Streams Example

An example is shown below. Here we process a list of airport names as follows:

1. Select airports that start with “B”

2. Convert the airport name to uppercase

3. Sort the list

4. And print out the element

List<String> airports =
    Arrays.asList("Birmingham-Shuttlesworth International",
		  "Anchorage International",
		  "Deadhorse",
		  "Phoenix Sky Harbor International",
		  "Tucson International",
		  "Los Angeles International",
		  "San Francisco International",
		  "Burbank Bob Hope Airport",
		  "Long Beach Airport",
		  "Oakland International");

airports
    .stream()
    .filter(a -> a.startsWith("B"))
    .map(String::toUpperCase)
    .sorted()
    .forEach(System.out::println);

// prints the following
// BIRMINGHAM-SHUTTLESWORTH INTERNATIONAL
// BURBANK BOB HOPE AIRPORT

Wouldn’t it be nice to apply these processing pipelines to other sequences such as the lines of a file? In this article, we show how to do exactly that.

4. Reading a File Line By Line

To read a file line by line in Java, we use a BufferedReader instance and read in a loop till all the lines are exhausted.

try (BufferedReader in = new BufferedReader(new FileReader(textFile))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here
    }
}

3. Implement a Spliterator Class

To turn a BufferedReader into a class capable of being used with the Java 8 Streams API, we need to provide an implementation of the Spliterator interface. Shown is the LineReaderSpliterator class which implements Spliterator<String> and turns a BufferedReader into a stream of lines.

public class LineReaderSpliterator implements Spliterator<String>
{
    private final BufferedReader reader;
    private java.io.IOException exception;

    public LineReaderSpliterator(BufferedReader reader) {
	this.reader = reader;
    }

    public java.io.IOException ioException() { return exception; }

    public int characteristics() {
	return DISTINCT | NONNULL | IMMUTABLE;
    }

    public long estimateSize() {
	return Long.MAX_VALUE;
    }

    public boolean tryAdvance(Consumer<? super String> action) {
	try {
	    String line = reader.readLine();
	    if ( line != null ) {
		action.accept(line);
		return true;
	    } else return false;
	} catch(java.io.IOException ex) {
	    this.exception = ex;
	    return false;
	}
    }

    public Spliterator<String> trySplit() { return null; }
}

3.1 Constructor

The LineReaderSpliterator is initialized with an instance of BufferedReader which serves as the input line source.

public LineReaderSpliterator(BufferedReader reader) {
    this.reader = reader;
}

3.2 Characteristics

The characteristics of the Spliterator must be indicated with the implementation of the characteristics() method. The result must be an OR-ed values from the following:

ORDERED: indicates that the order of elements is defined. The ordering is expected to be preserved in parallel computations.

DISTINCT: Each element is distinct from another element.

SORTED: The sequence is sorted. In our case, the lines may not be sorted so this bit is not set.

SIZED: This bit must be set to indicate that the estimate of size returned by estimateSize() is correct. For our case, we do not know the number of lines in a file so this bit is not set.

NONNULL: Elements are guaranteed to be non-null.

IMMUTABLE: Requires that element source should not be modified to add, replace or remove elements.

CONCURRENT: Indicates that element source can be modified concurrently with additions, replacements and removals from multiple threads.

SUBSIZED: If the spliterator can be split and child spliterators are SIZED and SUBSIZED.

In our case, the spliterator is specified as DISTINCT, NONNULL and IMMUTABLE.

3.3 Size estimation

Our spliterator does not know the size of the collection since the number of lines in the BufferedReader is not known. If the size is now known or is unbounded, the method must return Long.MAX_VALUE.

public long estimateSize() {
    return Long.MAX_VALUE;
}

3.4 Process Next Element

The method to process the next element is the tryAdvance() method which accepts a functional interface Consumer<? super T>. Our implementation attempts to read a line and if successful (EOF not reached) invokes action.accept(). If EOF is reached or an exception occurs, the method return false to indicate end-of-sequence.

public boolean tryAdvance(Consumer<? super String> action) {
    try {
	String line = reader.readLine();
	if ( line != null ) {
	    action.accept(line);
	    return true;
	} else return false;
    } catch(java.io.IOException ex) {
	this.exception = ex;
	return false;
    }
}

3.5 Can the Spliterator split?

If the spliterator can partitioned to return separate ranges of elements, a new Spliterator must be returned. We cannot partition the input into separate sequences so we return null.

public Spliterator<String> trySplit() { return null; }

 4. Using the Stream

We can now use the Spliterator implementation to convert a BufferedReader into a Java 8 stream as follows:

static private Stream<String> createStreamReader(BufferedReader reader)
{
    LineReaderSpliterator s = new LineReaderSpliterator(reader);
    return StreamSupport.stream(s, false);
}

The earlier streams example can now be written as shown.

BufferedReader reader = null;
try {
    reader = new BufferedReader(new FileReader(textFile));
    createStreamReader(reader)
	.filter(a -> a.startsWith("B"))
	.map(String::toUpperCase)
	.sorted()
	.forEach(System.out::println);
} finally {
    if ( reader != null ) reader.close();
}

Check the output shown below:

BALTIMORE-WASHINGTON INTERNATIONAL
BANGOR INTERNATIONAL
BIRMINGHAM-SHUTTLESWORTH INTERNATIONAL
BRADLEY INTERNATIONAL
BURBANK BOB HOPE AIRPORT
BURLINGTON INTERNATIONAL

The example does not store the names in a data structure. Rather, names are directly read from the file and the processing pipeline is applied to the sequence of elements.

Summary

We have demonstrated how to read lines from a file and process it using Java 8 streams. This requires implementation of a Spliterator class for delivering a “stream” view of any sequence. The advantage of such an approach is the ease of filtering and processing text files.