“Money may not buy happiness, but I’d rather cry in a Jaguar than on a bus.”
― Françoise Sagan
Contents
1. Introduction
Today let us see how we can implement a pivot table using java 8 streams. Raw data by itself does not deliver much insight to humans. We need some kind of data aggregation to discern patterns in raw data. A pivot table is one such instrument. Other more visual methods of aggregation include graphs and charts.
In previous articles, we showed how to use java 8 streams to perform SQL-like slicing and dicing of raw data. Today’s article will build on some of those examples, so if the material here appears a bit dense to you, I suggest you browse those articles first.
If you do not like this “raw data” approach to creating a pivot table, and would like Excel to be your medium of choice, we have you covered.
2. CSV Data Represented as a POJO
We use the following POJO representing baseball players and their salaries.
public class Player { private int year; private String teamID; private String lgID; private String playerID; private int salary; // defined getters and setters here }
The data comes from a CSV file which is quite simple – no quoted fields, no multi-line fields, and no commas inside a field. We can use a simple Regex pattern to parse the CSV and load the data into a List. The data looks like this and contains about 26428 rows:
yearID,teamID,lgID,playerID,salary 1985,ATL,NL,barkele01,870000 1985,ATL,NL,bedrost01,550000 1985,ATL,NL,benedbr01,545000 1985,ATL,NL,campri01,633333 1985,ATL,NL,ceronri01,625000 ...
The CSV data is loaded using streams with code similar to:
Pattern pattern = Pattern.compile(","); try (BufferedReader in = new BufferedReader(new FileReader(filename));){ List<Player> players = in .lines() .skip(1) .map(line -> { String[] arr = pattern.split(line); return new Player(Integer.parseInt(arr[0]), arr[1], arr[2], arr[3], Integer.parseInt(arr[4])); }) .collect(Collectors.toList()); }
3. Define the Pivot Columns Class
The following class defines the class that we use as the container for the pivot columns. These are the columns that the data is grouped by. If using SQL, these columns would appear in the “GROUP BY” clause.
public class YearTeam { public int year; public String teamID; public YearTeam(int year,String teamID) { this.year = year; this.teamID = teamID; } @Override public boolean equals(Object other) { if ( other == null ) return false; if ( this == other ) return true; if ( other instanceof YearTeam ) { YearTeam yt = (YearTeam)other; if ( year == yt.year && teamID.equals(yt.teamID) ) return true; } return false; } @Override public int hashCode() { int hash = 1; hash = hash * 17 + year; hash = hash * 31 + teamID.hashCode(); return hash; } @Override public String toString() { StringBuilder sbuf = new StringBuilder(); sbuf.append('[').append(year).append(", ").append(teamID) .append(']'); return sbuf.toString(); } }
The fields are defined with the “public” accessor merely for convenience. For your application you can make it private and add getters and/or setters as needed.
The class overrides equals() and hashCode() since it will be used as the key when storing in a Map.
4. Grouping Data using Streams
And here we read the CSV data, create a POJO for each row and group the data using the pivot columns class defined above.
Map<YearTeam,List<Player>> grouped = in .lines() .skip(1) .map(line -> { String[] arr = pattern.split(line); return new Player(Integer.parseInt(arr[0]), arr[1], arr[2], arr[3], Integer.parseInt(arr[4])); }) .collect(Collectors.groupingBy(x-> new YearTeam(x.getYear(), x.getTeamID())));
At this point, the data has been collected in a Map properly grouped by the specified columns.
5. Printing the Pivot Table as CSV
Let us print the data in the pivot table as CSV so we can load into Excel for comparison. While printing the data, we apply the aggregate function summingLong(). Java 8 streams also provides averagingLong() which gives you the average. Need more, or all of it at once? summarizingLong() dumps all of it in your face. Enjoy!
5.1. CSV Column Headers
We use values of the teamID as column headers. Collect and print them as shown. We use a TreeSet here to have it sorted alphabetically.
Set<String> teams = grouped .keySet() .stream() .map(x -> x.teamID) .collect(Collectors.toCollection(TreeSet::new)); System.out.print(','); teams.stream().forEach(t -> System.out.print(t + ",")); System.out.println();
5.2. Printing the Data
And here is the complete pivot table being built and printed. For each year and team, we extract the list of players, perform a summing operation and print the total.
Set<Integer> years = grouped .keySet() .stream() .map(x -> x.year) .collect(Collectors.toSet()); years .stream() .forEach(y -> { System.out.print(y + ","); teams.stream().forEach(t -> { YearTeam yt = new YearTeam(y, t); List<Player> players = grouped.get(yt); if ( players != null ) { long total = players .stream() .collect(Collectors.summingLong(Player::getSalary)); System.out.print(total); } System.out.print(','); }); System.out.println(); });
6. Comparing Output in Excel
Loading the CSV output in Excel shows the data as follows:
Which compared with Excel’s own pivot table shows the data to be identical. (For some reason column “MON
” has appeared first in Excel as shown below. Maybe it is one of the many “features” available. The values are identical though.)
And that, my friends, is one way you can prepare a pivot table from inside java using plain Collections. Go find some cool uses for it!
Summary
A pivot table is a useful data summarizing tool. It is available in most data analytics software including Excel. Here we learnt how to build the same data structure using Java 8 streams. We used grouping-by and summing to achieve the functionality.