“If the facts don’t fit the theory, change the facts.”
― Albert Einstein
1. Introduction
Apache Spark is the newest kid on the block talking big data.
While re-using major components of the Apache Hadoop Framework, Apache Spark lets you execute big data processing jobs that do not neatly fit into the Map-Reduce paradigm. It provides support for many patterns similar to the Java 8 Streams functionality, while letting you run these jobs on a cluster.