“There is only one thing that makes a dream impossible to achieve: the fear of failure.”
― Paulo Coelho, The Alchemist
Contents
1. Introduction
Matplotlib is a graphics and charting library for python. Once data is sliced and diced using pandas, you can use matplotlib for visualization. In this starter tutorial, we take you through the steps to do just that.
This is a beginner tutorial so no prior knowlegde of matplotlib is assumed. You do need some knowledge of pandas DataFrame and the Series. Check these articles if you need a refresher.
- Creating a Series and a DataFrame.
- DataFrame Basics.
- Selecting and Extracting data.
- Grouping By Examples.
2. Getting Started
You need the following imports.
import pandas as pd import matplotlib.pyplot as plt import numpy as np
If you are working through these examples in a Jupyter Notebook, you need the following declaration:
%matplotlib inline
Let us now plot some data.
3. Visualizing AAPL Stock Price
Let us get AAPL stock price variation data from NASDAQ for analysis. Go to the NASDAQ site, select historical prices for 6 months and download the data as CSV.
4. Loading Data into a DataFrame
Load the data into a pandas DataFrame for analyis.
x = pd.read_csv('big-data/AAPLHistoricalQuotes.csv') print x.head() print x.dtypes # prints date close volume open high low 0 16:00 141.80 20,350,000 141.60 142.1500 141.01 1 2017/04/12 141.80 20320420.0000 141.60 142.1500 141.01 2 2017/04/11 141.63 30341520.0000 142.94 143.3500 140.06 3 2017/04/10 143.17 18904680.0000 143.60 143.8792 142.90 4 2017/04/07 143.34 16658660.0000 143.73 144.1800 143.27 date object close float64 volume object open float64 high float64 low float64 dtype: object
5. Charting the Close Price
And lets draw some charts.
x[['date', 'close']].set_index('date').plot(figsize=(15, 8))
That results in the graph shown.
Here is how the code works. The following selects the required columns from the DataFrame.
print x[['date', 'close']] # prints 0 16:00 141.80 1 2017/04/12 141.80 2 2017/04/11 141.63 3 2017/04/10 143.17 4 2017/04/07 143.34 5 2017/04/06 143.66 ...
This sets the date as the index of the DataFrame so it can appear as the x-axis of the chart.
print x[['date', 'close']].set_index('date') # prints close date 16:00 141.80 2017/04/12 141.80 2017/04/11 141.63 2017/04/10 143.17 2017/04/07 143.34 2017/04/06 143.66 ...
And this plots the date versus the close price scaling the figure to 15 in by 8 in.
x[['date', 'close']].set_index('date').plot(figsize=(15, 8))
Pretty simple, eh?
6. Modify Data Before Plotting
Let us now plot the daily volume against the date. Now, the daily volume data looks like this:
print x[['date', 'volume']].set_index('date') # prints volume date 16:00 20,350,000 2017/04/12 20320420.0000 2017/04/11 30341520.0000 2017/04/10 18904680.0000 2017/04/07 16658660.0000 2017/04/06 21131040.0000 ...
See the comma in the volume column of the first row? That is a no-no since it is not a valid float. It needs to be removed and the data type converted to numeric before the data can be plotted. Which is done like this:
print x[['date', 'volume']].set_index('date').apply(lambda a : a.str.replace(',', '')).apply(pd.to_numeric).head() # prints volume date 16:00 20350000.0 2017/04/12 20320420.0 2017/04/11 30341520.0 2017/04/10 18904680.0 2017/04/07 16658660.0
And now plot the data.
x[['date', 'volume']].set_index('date').apply(lambda a : a.str.replace(',', '')).apply(pd.to_numeric).plot(figsize=(15,8))
Which results in:
7. Multiple Plots in One Chart
Can we plot the volume and the close price against the date to see if there is a correlation? Sure we can.
def joe(a): return pd.to_numeric(a.str.replace(',', '')) if a.name == 'volume' else a; x[['date', 'close', 'volume']].set_index('date').apply(joe).plot(figsize=(15,8))
We select three columns from the DataFrame, apply a function to remove commas and convert volume to float (as above, but with a single apply()) and plot the data. Which results in:
8. Need a Different Y-Axis
Wait a minute. The Close Price looks flat! Why is this? It appears that the scale of data is different for the Price and the Volume (no surprises there). We have to add another y-axis to the plot. We do that by plotting the data separately, which is one way to do that.
def joe(a): return pd.to_numeric(a.str.replace(',', '')) if a.name == 'volume' else a; xx = x.set_index('date').apply(joe) xx.close.plot(label='Close', legend=True,figsize=(15,8)) xx.open.plot(label='Open', legend=True).set_ylabel('Price($)') xx.volume.plot(secondary_y=True, label='Volume', legend=True).set_ylabel('Volume')
We also set the correct y-label on the plot and add legends. The result is shown below.
9. Final Adjustments
As final adjustments to the plot, we add the open price to the chart, set the chart legend and increase the line widths. We also load the data directly from a different site to illustrate URL data loading.
aapl = pd.read_csv('http://chart.finance.yahoo.com/table.csv?s=AAPL&a=3&b=13&c=2016&d=3&e=13&f=2017&g=d&ignore=.csv') aapl = aapl.set_index('Date') aapl.Open.plot(title='AAPL', legend=True, lw=3, figsize=(15,8)) aapl.Close.plot(legend=True, lw=3).set_ylabel('Price($)') aapl.Volume.plot(secondary_y=True, legend=True, lw=3, grid=True).set_ylabel('Volume')
The result is shown below.
And that, my friends, is a simple matplotlib tutorial.
Summary
We went through this simple tutorial on matplotlib. We covered how to load data into a DataFrame, extract required columns from it and plot the data. Data can also be massaged to the form required for plotting. Finally we covered how to add multiple graphs to a plot and set the properties of the various artifacts on the chart.
One thought on “Pandas Tutorial – Using Matplotlib”