Java – Read File Line by Line Using Java 8 Streams

1. Introduction

Java 8 Streams provide a cool facility to apply functional-style operations on Collection-based classes such as List and Map. These functional-style operations are very expressive and allow elimination of much boiler-plate code for processing pipelines which operate on  elements in these collections.

2. A Streams Example

An example is shown below. Here we process a list of airport names as follows:

1. Select airports that start with “B”

2. Convert the airport name to uppercase

3. Sort the list

4. And print out the element

List<String> airports =
    Arrays.asList("Birmingham-Shuttlesworth International",
		  "Anchorage International",
		  "Deadhorse",
		  "Phoenix Sky Harbor International",
		  "Tucson International",
		  "Los Angeles International",
		  "San Francisco International",
		  "Burbank Bob Hope Airport",
		  "Long Beach Airport",
		  "Oakland International");

airports
    .stream()
    .filter(a -> a.startsWith("B"))
    .map(String::toUpperCase)
    .sorted()
    .forEach(System.out::println);

// prints the following
// BIRMINGHAM-SHUTTLESWORTH INTERNATIONAL
// BURBANK BOB HOPE AIRPORT

Wouldn’t it be nice to apply these processing pipelines to other sequences such as the lines of a file? In this article, we show how to do exactly that.

4. Reading a File Line By Line

To read a file line by line in Java, we use a BufferedReader instance and read in a loop till all the lines are exhausted.

try (BufferedReader in = new BufferedReader(new FileReader(textFile))) {
    String line;
    while ((line = in.readLine()) != null) {
        // process line here
    }
}

3. Implement a Spliterator Class

To turn a BufferedReader into a class capable of being used with the Java 8 Streams API, we need to provide an implementation of the Spliterator interface. Shown is the LineReaderSpliterator class which implements Spliterator<String> and turns a BufferedReader into a stream of lines.

public class LineReaderSpliterator implements Spliterator<String>
{
    private final BufferedReader reader;
    private java.io.IOException exception;

    public LineReaderSpliterator(BufferedReader reader) {
	this.reader = reader;
    }

    public java.io.IOException ioException() { return exception; }

    public int characteristics() {
	return DISTINCT | NONNULL | IMMUTABLE;
    }

    public long estimateSize() {
	return Long.MAX_VALUE;
    }

    public boolean tryAdvance(Consumer<? super String> action) {
	try {
	    String line = reader.readLine();
	    if ( line != null ) {
		action.accept(line);
		return true;
	    } else return false;
	} catch(java.io.IOException ex) {
	    this.exception = ex;
	    return false;
	}
    }

    public Spliterator<String> trySplit() { return null; }
}

3.1 Constructor

The LineReaderSpliterator is initialized with an instance of BufferedReader which serves as the input line source.

public LineReaderSpliterator(BufferedReader reader) {
    this.reader = reader;
}

3.2 Characteristics

The characteristics of the Spliterator must be indicated with the implementation of the characteristics() method. The result must be an OR-ed values from the following:

ORDERED: indicates that the order of elements is defined. The ordering is expected to be preserved in parallel computations.

DISTINCT: Each element is distinct from another element.

SORTED: The sequence is sorted. In our case, the lines may not be sorted so this bit is not set.

SIZED: This bit must be set to indicate that the estimate of size returned by estimateSize() is correct. For our case, we do not know the number of lines in a file so this bit is not set.

NONNULL: Elements are guaranteed to be non-null.

IMMUTABLE: Requires that element source should not be modified to add, replace or remove elements.

CONCURRENT: Indicates that element source can be modified concurrently with additions, replacements and removals from multiple threads.

SUBSIZED: If the spliterator can be split and child spliterators are SIZED and SUBSIZED.

In our case, the spliterator is specified as DISTINCT, NONNULL and IMMUTABLE.

3.3 Size estimation

Our spliterator does not know the size of the collection since the number of lines in the BufferedReader is not known. If the size is now known or is unbounded, the method must return Long.MAX_VALUE.

public long estimateSize() {
    return Long.MAX_VALUE;
}

3.4 Process Next Element

The method to process the next element is the tryAdvance() method which accepts a functional interface Consumer<? super T>. Our implementation attempts to read a line and if successful (EOF not reached) invokes action.accept(). If EOF is reached or an exception occurs, the method return false to indicate end-of-sequence.

public boolean tryAdvance(Consumer<? super String> action) {
    try {
	String line = reader.readLine();
	if ( line != null ) {
	    action.accept(line);
	    return true;
	} else return false;
    } catch(java.io.IOException ex) {
	this.exception = ex;
	return false;
    }
}

3.5 Can the Spliterator split?

If the spliterator can partitioned to return separate ranges of elements, a new Spliterator must be returned. We cannot partition the input into separate sequences so we return null.

public Spliterator<String> trySplit() { return null; }

 4. Using the Stream

We can now use the Spliterator implementation to convert a BufferedReader into a Java 8 stream as follows:

static private Stream<String> createStreamReader(BufferedReader reader)
{
    LineReaderSpliterator s = new LineReaderSpliterator(reader);
    return StreamSupport.stream(s, false);
}

The earlier streams example can now be written as shown.

BufferedReader reader = null;
try {
    reader = new BufferedReader(new FileReader(textFile));
    createStreamReader(reader)
	.filter(a -> a.startsWith("B"))
	.map(String::toUpperCase)
	.sorted()
	.forEach(System.out::println);
} finally {
    if ( reader != null ) reader.close();
}

Check the output shown below:

BALTIMORE-WASHINGTON INTERNATIONAL
BANGOR INTERNATIONAL
BIRMINGHAM-SHUTTLESWORTH INTERNATIONAL
BRADLEY INTERNATIONAL
BURBANK BOB HOPE AIRPORT
BURLINGTON INTERNATIONAL

The example does not store the names in a data structure. Rather, names are directly read from the file and the processing pipeline is applied to the sequence of elements.

Summary

We have demonstrated how to read lines from a file and process it using Java 8 streams. This requires implementation of a Spliterator class for delivering a “stream” view of any sequence. The advantage of such an approach is the ease of filtering and processing text files.

Difference Between HashMap and Hashtable in Java

1. Introduction

Java provides several ways of storing key-value maps (also known as dictionaries). The most common ones are java.util.HashMap and java.util.Hashtable. Let us explore the difference between these two classes.

2. Synchronization

Synchronization is a mechanism in Java for preventing multiple threads from interfering with each other and eliminating memory consistency errors.

When one variable (a resource) is visible to multiple threads at the same time, consistency issues arise when one thread attempts to modify the value while another thread is accessing it. To prevent these issues, some form of synchronization must be used.

While synchronization helps in eliminating consistency errors, it adds an overhead when used. In a single-threaded program (or when you can guarantee that a single thread will access the resource), you can use HashMap to eliminate this overhead. Create a HashMap as follows:

HashMap<String,Object> map = new HashMap<>();
map.put("currentTime", new Date());

However, when accessing or modifying a dictionary shared between multiple threads, you must use Hashtable. The following shows how to create a Hashtable:

Hashtable<String,Integer> tbl = new Hashtable();
tbl.put("count", 32);

3. Using Null Keys or Values

When your dictionary needs to contain null keys or values, you cannot use Hashtable since this is not allowed. You must use HashMap in this case.

If you need multiple threads reading or writing the HashMap, you can wrap the HashMap using Collections.synchronizedMap() as follows:

HashMap<String,Object> map = Collections.synchronizedMap(new HashMap<>());
map.put(...);

A HashMap can contain one null key and any number of nulls for values.

4. Predictable Iteration Order

A subclass of HashMap is LinkedHashMap which maintains a doubly-linked list of the entries in the Map. This allows traversal of the Map entries in a predictable order (in the order that the entries were inserted into the Map). If you need such a predictable ordering of the entries, then you can easily replace the HashMap with a LinkedHashMap as follows:

HashMap<String,Object> map = new LinkedHashMap<>();
map.put(...);

When using a Hashtable, such a predictable iteration order is not possible. If this is required, use a LinkedHashMap with a Collections.synchronizedMap() wrapper as above.

5. Iterating using Enumerator

While both HashMap and Hashtable support iteration over the entries using the entrySet(), Hashtable also provides an Enumeration of the entries using the Hashtable.elements() method. In addition, a Hashtable.keys() method also returns an Enumeration over the keys of the Hashtable.

Hashtable<String,Object> tbl = ...;
for(Enumeration<String> keys = tbl.keys() ; tbl.hasMoreElements() ; ) {
  System.out.println(keys.nextElement());
}

Conclusion

Here is how you can decide when to use HashMap or Hashtable:

  • For using as a shared resource between multiple threads in a single program, a Hashtable is preferred.
  • When the dictionary needs to contain null keys or values, a HashMap must be used.
  • A HashMap can be used in a multi-threaded environment by wrapping it with Collections.synchronizedMap().