“The only way of discovering the limits of the possible is to venture a little way past them into the impossible.”
― Arthur C. Clarke
Contents
Introduction
Java provides a Scanner class that can be used as a text parser. It accepts a regular expression as a delimiter and returns tokens separated by the delimiter. Let us look at some usage scenarios of the Scanner class.
Count Words in a File
The default delimiter used by the Scanner is white-space. So it returns text tokens separated by white-space. Let us use this fact to count words in a file. The following code prints the word as well as its index from the file. Note that the Scanner implements AutoCloseable interface so we can use it in try-with-resources block.
try (Scanner scanner = new Scanner(new File(filename));) { int nword = 0; while (scanner.hasNext()) { String sent = scanner.next(); nword++; System.out.printf("%3d) %s%n", nword, sent); } }
Read Text by Paragraph
Specifying a empty-line regex as the delimiter allows you to read text by paragraphs. The regular expression pattern specifies the multi-line flag so ^
and $
match at the beginning and end of each line rather than the whole input.
try (Scanner scanner = new Scanner(new File(filename));) { scanner.useDelimiter("(?m:^$)"); int ntoken = 0; while (scanner.hasNext()) { String token = scanner.next(); ntoken++; System.out.printf("%3d) %s%n", ntoken, token); } }
Scanner Trick: Read Whole File
To read the whole file into a single String, use the following. The delimiter here is the regular expression for the beginning of the file.
try (Scanner scanner = new Scanner(new File(filename));) { scanner.useDelimiter("\\A"); String all = scanner.next(); }
Summary
The Java class Scanner is used for text parsing. By setting the delimiter appropriately, various parsing tasks can be accomplished.