Java Scanner

“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.”
― Arthur C. Clarke

Introduction

Java provides a Scanner class that can be used as a text parser. It accepts a regular expression as a delimiter and returns tokens separated by the delimiter. Let us look at some usage scenarios of the Scanner class.

Count Words in a File

The default delimiter used by the Scanner is white-space. So it returns text tokens separated by white-space. Let us use this fact to count words in a file. The following code prints the word as well as its index from the file. Note that the Scanner implements AutoCloseable interface so we can use it in try-with-resources block.

try (Scanner scanner = new Scanner(new File(filename));) {
    int nword = 0;
    while (scanner.hasNext()) {
	String sent = scanner.next();
	nword++;
	System.out.printf("%3d) %s%n", nword, sent);
    }
}

Read Text by Paragraph

Specifying a empty-line regex as the delimiter allows you to read text by paragraphs. The regular expression pattern specifies the multi-line flag so ^ and $ match at the beginning and end of each line rather than the whole input.

try (Scanner scanner = new Scanner(new File(filename));) {
    scanner.useDelimiter("(?m:^$)");
    int ntoken = 0;
    while (scanner.hasNext()) {
	String token = scanner.next();
	ntoken++;
	System.out.printf("%3d) %s%n", ntoken, token);
    }
}

Scanner Trick: Read Whole File

To read the whole file into a single String, use the following. The delimiter here is the regular expression for the beginning of the file.

try (Scanner scanner = new Scanner(new File(filename));) {
    scanner.useDelimiter("\\A");
    String all = scanner.next();
}

Summary

The Java class Scanner is used for text parsing. By setting the delimiter appropriately, various parsing tasks can be accomplished.

Leave a Reply

Your email address will not be published. Required fields are marked *