Learn how to massage data using pandas DataFrame and plot the result using matplotlib in this beginner tutorial.
“There is only one thing that makes a dream impossible to achieve: the fear of failure.”
― Paulo Coelho, The Alchemist
Matplotlib is a graphics and charting library for python. Once data is sliced and diced using pandas, you can use matplotlib for visualization. In this starter tutorial, we take you through the steps to do just that.
Continue reading “Pandas Tutorial – Using Matplotlib”
Demonstrates grouping of data in pandas DataFrame and compares with SQL.
“Don’t waste your time with explanations: people only hear what they want to hear.”
― Paulo Coelho
Let us learn about the “grouping-by” operation in pandas. While similar to the SQL “group by”, the pandas version is much more powerful since you can use user-defined functions at various points including splitting, applying and combining results.
Continue reading “Pandas Tutorial – Grouping Examples”
Did you know that you can perform SQL-like selections with a pandas DataFrame? Learn how!
“Always keep your words soft and sweet, just in case you have to eat them.”
― Andy Rooney
In this article, we present SQL-like ways of selecting data from a pandas DataFrame. The SELECT clause is very familiar to database programmers for accessing data within an SQL database. The DataFrame provides similar functionality when working with datasets, but is far more powerful since it supports using predicate functions with a simple syntax.
Continue reading “Pandas Tutorial – SQL-like Data Selection”
Learn the various ways of selecting data from a DataFrame.
“Always and never are two words you should always remember never to use. ”
― Wendell Johnson
After covering ways of creating a DataFrame and working with it, we now concentrate on extracting data from the DataFrame. You may also be interested in our tutorials on a related data structure – Series; part 1 and part 2.
Continue reading “Pandas Tutorial – Selecting Rows From a DataFrame”
Learn the basics of working with a DataFrame in this pandas tutorial.
“The line between failure and success is so fine. . . that we are often on the line and do not know it.”
― Elbert Hubbard
The DataFrame is the most commonly used data structures in pandas. As such, it is very important to learn various specifics about working with the DataFrame. After learning various methods of creating a DataFrame, let us now delve into some methods for working with it.
Continue reading “Pandas Tutorial – DataFrame Basics”
We cover commonly used methods of the pandas Series object in this article.
“The truth is not for all men but only for those who seek it.”
― Ayn Rand
The Series is one of the most common pandas data structures. It is similar to a python list and is used to represent a column of data. After looking into the basics of creating and initializing a pandas Series object, we now delve into some common usage patterns and methods.
Continue reading “Python Pandas Tutorial – Series Methods”
Learn the basics of creating a DataFrame in this tutorial series on pandas.
This is the next part of the pandas tutorial. In a previous article, we covered the pandas Series class. Today we are getting started with the main pandas data structure, the DataFrame.
Continue reading “Python Pandas Tutorial – DataFrame”
Learn the basics of pandas Series in this beginner tutorial.
“Come friends, it’s not too late to seek a newer world.”
― Alfred Tennyson
Pandas is a powerful toolkit providing data-analysis tools and structures for the python programming language.
Among the most important artifacts provided by pandas is the Series. In this article, we introduce the Series class from a beginner’s perspective. That means you do not need to know anything about pandas or data-analysis to understand this tutorial.
Continue reading “Python Pandas Tutorial – Series”
Implement a Pivot Table in Java using Java 8 Streams and Collections.
“Money may not buy happiness, but I’d rather cry in a Jaguar than on a bus.”
― Françoise Sagan
Today let us see how we can implement a pivot table using java 8 streams. Raw data by itself does not deliver much insight to humans. We need some kind of data aggregation to discern patterns in raw data. A pivot table is one such instrument. Other more visual methods of aggregation include graphs and charts.
Continue reading “Java – Pivot Table using Streams”
We demonstrate how to setup an Apache Spark cluster on a single AWS EC2 node and run a couple of jobs.
“If the facts don’t fit the theory, change the facts.”
― Albert Einstein
Apache Spark is the newest kid on the block talking big data.
While re-using major components of the Apache Hadoop Framework, Apache Spark lets you execute big data processing jobs that do not neatly fit into the Map-Reduce paradigm. It provides support for many patterns similar to the Java 8 Streams functionality, while letting you run these jobs on a cluster.
Continue reading “Apache Spark – Setup Cluster on AWS”