“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.”
― Arthur C. Clarke

Contents
Introduction
Java provides a Scanner class that can be used as a text parser. It accepts a regular expression as a delimiter and returns tokens separated by the delimiter. Let us look at some usage scenarios of the Scanner class.
Count Words in a File
The default delimiter used by the Scanner is white-space. So it returns text tokens separated by white-space. Let us use this fact to count words in a file. The following code prints the word as well as its index from the file. Note that the Scanner implements AutoCloseable interface so we can use it in try-with-resources block.
try (Scanner scanner = new Scanner(new File(filename));) {
int nword = 0;
while (scanner.hasNext()) {
String sent = scanner.next();
nword++;
System.out.printf("%3d) %s%n", nword, sent);
}
}
Read Text by Paragraph
Specifying a empty-line regex as the delimiter allows you to read text by paragraphs. The regular expression pattern specifies the multi-line flag so ^ and $ match at the beginning and end of each line rather than the whole input.
try (Scanner scanner = new Scanner(new File(filename));) {
scanner.useDelimiter("(?m:^$)");
int ntoken = 0;
while (scanner.hasNext()) {
String token = scanner.next();
ntoken++;
System.out.printf("%3d) %s%n", ntoken, token);
}
}
Scanner Trick: Read Whole File
To read the whole file into a single String, use the following. The delimiter here is the regular expression for the beginning of the file.
try (Scanner scanner = new Scanner(new File(filename));) {
scanner.useDelimiter("\\A");
String all = scanner.next();
}
Summary
The Java class Scanner is used for text parsing. By setting the delimiter appropriately, various parsing tasks can be accomplished.