Pandas Tutorial – Using Matplotlib

Learn how to massage data using pandas DataFrame and plot the result using matplotlib in this beginner tutorial.

“There is only one thing that makes a dream impossible to achieve: the fear of failure.”
― Paulo Coelho, The Alchemist

1. Introduction

Matplotlib is a graphics and charting library for python. Once data is sliced and diced using pandas, you can use matplotlib for visualization. In this starter tutorial, we take you through the steps to do just that.

This is a beginner tutorial so no prior knowlegde of matplotlib is assumed. You do need some knowledge of pandas DataFrame and the Series. Check these articles if you need a refresher.

2. Getting Started

You need the following imports.

```import pandas as pd
import matplotlib.pyplot as plt
import numpy as np```

If you are working through these examples in a Jupyter Notebook, you need the following declaration:

`%matplotlib inline`

Let us now plot some data.

3. Visualizing AAPL Stock Price

Let us get AAPL stock price variation data from NASDAQ for analysis. Go to the NASDAQ site, select historical prices for 6 months and download the data as CSV.

Load the data into a pandas DataFrame for analyis.

```x = pd.read_csv('big-data/AAPLHistoricalQuotes.csv')
print x.dtypes

# prints
date   close         volume    open      high     low
0       16:00  141.80     20,350,000  141.60  142.1500  141.01
1  2017/04/12  141.80  20320420.0000  141.60  142.1500  141.01
2  2017/04/11  141.63  30341520.0000  142.94  143.3500  140.06
3  2017/04/10  143.17  18904680.0000  143.60  143.8792  142.90
4  2017/04/07  143.34  16658660.0000  143.73  144.1800  143.27
date       object
close     float64
volume     object
open      float64
high      float64
low       float64
dtype: object```

5. Charting the Close Price

And lets draw some charts.

`x[['date', 'close']].set_index('date').plot(figsize=(15, 8))`

That results in the graph shown.

Here is how the code works. The following selects the required columns from the DataFrame.

```print x[['date', 'close']]

# prints
0         16:00  141.80
1    2017/04/12  141.80
2    2017/04/11  141.63
3    2017/04/10  143.17
4    2017/04/07  143.34
5    2017/04/06  143.66
...```

This sets the date as the index of the DataFrame so it can appear as the x-axis of the chart.

```print x[['date', 'close']].set_index('date')

# prints
close
date
16:00       141.80
2017/04/12  141.80
2017/04/11  141.63
2017/04/10  143.17
2017/04/07  143.34
2017/04/06  143.66
...```

And this plots the date versus the close price scaling the figure to 15 in by 8 in.

`x[['date', 'close']].set_index('date').plot(figsize=(15, 8))`

Pretty simple, eh?

6. Modify Data Before Plotting

Let us now plot the daily volume against the date. Now, the daily volume data looks like this:

```print x[['date', 'volume']].set_index('date')

# prints
volume
date
16:00          20,350,000
2017/04/12  20320420.0000
2017/04/11  30341520.0000
2017/04/10  18904680.0000
2017/04/07  16658660.0000
2017/04/06  21131040.0000
...```

See the comma in the volume column of the first row? That is a no-no since it is not a valid float. It needs to be removed and the data type converted to numeric before the data can be plotted. Which is done like this:

```print x[['date', 'volume']].set_index('date').apply(lambda a : a.str.replace(',', '')).apply(pd.to_numeric).head()

# prints
volume
date
16:00       20350000.0
2017/04/12  20320420.0
2017/04/11  30341520.0
2017/04/10  18904680.0
2017/04/07  16658660.0```

And now plot the data.

`x[['date', 'volume']].set_index('date').apply(lambda a : a.str.replace(',', '')).apply(pd.to_numeric).plot(figsize=(15,8))`

Which results in:

7. Multiple Plots in One Chart

Can we plot the volume and the close price against the date to see if there is a correlation? Sure we can.

```def joe(a): return pd.to_numeric(a.str.replace(',', '')) if a.name == 'volume' else a;
x[['date', 'close', 'volume']].set_index('date').apply(joe).plot(figsize=(15,8))```

We select three columns from the DataFrame, apply a function to remove commas and convert volume to float (as above, but with a single apply()) and plot the data. Which results in:

8. Need a Different Y-Axis

Wait a minute. The Close Price looks flat! Why is this? It appears that the scale of data is different for the Price and the Volume (no surprises there). We have to add another y-axis to the plot. We do that by plotting the data separately, which is one way to do that.

```def joe(a): return pd.to_numeric(a.str.replace(',', '')) if a.name == 'volume' else a;
xx = x.set_index('date').apply(joe)
xx.close.plot(label='Close', legend=True,figsize=(15,8))
xx.open.plot(label='Open', legend=True).set_ylabel('Price(\$)')
xx.volume.plot(secondary_y=True, label='Volume', legend=True).set_ylabel('Volume')```

We also set the correct y-label on the plot and add legends. The result is shown below.

As final adjustments to the plot, we add the open price to the chart, set the chart legend and increase the line widths. We also load the data directly from a different site to illustrate URL data loading.

```aapl = pd.read_csv('http://chart.finance.yahoo.com/table.csv?s=AAPL&a=3&b=13&c=2016&d=3&e=13&f=2017&g=d&ignore=.csv')
aapl = aapl.set_index('Date')
aapl.Open.plot(title='AAPL', legend=True, lw=3, figsize=(15,8))
aapl.Close.plot(legend=True, lw=3).set_ylabel('Price(\$)')
aapl.Volume.plot(secondary_y=True, legend=True, lw=3, grid=True).set_ylabel('Volume')```

The result is shown below.

And that, my friends, is a simple matplotlib tutorial.

Summary

We went through this simple tutorial on matplotlib. We covered how to load data into a DataFrame, extract required columns from it and plot the data. Data can also be massaged to the form required for plotting. Finally we covered how to add multiple graphs to a plot and set the properties of the various artifacts on the chart.