Pandas Tutorial – Grouping Examples

Demonstrates grouping of data in pandas DataFrame and compares with SQL.

“Don’t waste your time with explanations: people only hear what they want to hear.”
― Paulo Coelho

1. Introduction

Let us learn about the “grouping-by” operation in pandas. While similar to the SQL “group by”, the pandas version is much more powerful since you can use user-defined functions at various points including splitting, applying and combining results.

Continue reading “Pandas Tutorial – Grouping Examples”

Pandas Tutorial – SQL-like Data Selection

Did you know that you can perform SQL-like selections with a pandas DataFrame? Learn how!

“Always keep your words soft and sweet, just in case you have to eat them.”
― Andy Rooney

1. Introduction

In this article, we present SQL-like ways of selecting data from a pandas DataFrame. The SELECT clause is very familiar to database programmers for accessing data within an SQL database. The DataFrame provides similar functionality when working with datasets, but is far more powerful since it supports using predicate functions with a simple syntax.

Continue reading “Pandas Tutorial – SQL-like Data Selection”

Pandas Tutorial – Selecting Rows From a DataFrame

Learn the various ways of selecting data from a DataFrame.

“Always and never are two words you should always remember never to use. ”
― Wendell Johnson

1. Introduction

After covering ways of creating a DataFrame and working with it, we now concentrate on extracting data from the DataFrame. You may also be interested in our tutorials on a related data structure – Series; part 1 and part 2.

Continue reading “Pandas Tutorial – Selecting Rows From a DataFrame”

Pandas Tutorial – DataFrame Basics

Learn the basics of working with a DataFrame in this pandas tutorial.

“The line between failure and success is so fine. . . that we are often on the line and do not know it.”
― Elbert Hubbard

1. Introduction

The DataFrame is the most commonly used data structures in pandas. As such, it is very important to learn various specifics about working with the DataFrame. After learning various methods of creating a DataFrame, let us now delve into some methods for working with it.

Continue reading “Pandas Tutorial – DataFrame Basics”

Python Pandas Tutorial – Series Methods

We cover commonly used methods of the pandas Series object in this article.

“The truth is not for all men but only for those who seek it.”
― Ayn Rand

1. Introduction

The Series is one of the most common pandas data structures. It is similar to a python list and is used to represent a column of data. After looking into the basics of creating and initializing a pandas Series object, we now delve into some common usage patterns and methods.

Continue reading “Python Pandas Tutorial – Series Methods”

Python Pandas Tutorial – DataFrame

Learn the basics of creating a DataFrame in this tutorial series on pandas.

1. Introduction

This is the next part of the pandas tutorial. In a previous article, we covered the pandas Series class. Today we are getting started with the main pandas data structure, the DataFrame.

Continue reading “Python Pandas Tutorial – DataFrame”

Python Pandas Tutorial – Series

Learn the basics of pandas Series in this beginner tutorial.

“Come friends, it’s not too late to seek a newer world.”
― Alfred Tennyson

1. Introduction

Pandas is a powerful toolkit providing data-analysis tools and structures for the python programming language.

Among the most important artifacts provided by pandas is the Series. In this article, we introduce the Series class from a beginner’s perspective. That means you do not need to know anything about pandas or data-analysis to understand this tutorial.

Continue reading “Python Pandas Tutorial – Series”

Parsing XML in Python

1. Introduction

XML can be parsed in python using the xml.etree.ElementTree library. This article shows you how to parse and extract elements, attributes and text from XML using this library.

While this library is easy to use, it loads the whole XML document into memory and hence may not be suitable for processing large XML files or where run-time memory is an issue.

2. Sample XML

Here is a short snippet of the sample XML we will be working with.

<?xml version="1.0"?>
<catalog>
  <book id="bk101">
    <author>Gambardella, Matthew</author>
    <title>XML Developer's Guide</title>
    <genre>Computer</genre>
    <price>44.95</price>
    <publish_date>2000-10-01</publish_date>
    <description>An in-depth look at creating applications 
    with XML.</description>
  </book>
  <book id="bk102">
    <author>Ralls, Kim</author>
    <title>Midnight Rain</title>
...

3. Getting Started

3.1 Parsing an XML File

It is very simple to parse an XML using ElementTree. The following returns an ElementTree object which contains all the XML artifacts.

import xml.etree.ElementTree as ET

tree = ET.parse('catalog.xml')

Once you have the ElementTree object, you can get the root Element object using the tree.getroot() method:

root = tree.getroot()

3.2 Parsing an XML String

We use the ElementTree.fromstring() method to parse an XML string. The method returns root Element directly: a subtle difference compared with the ElementTree.parse() method which returns an ElementTree object.

print ET.fromstring('<a><b>1</b>2</a>')
# prints "<Element 'a' at ...>"

4. Working with Elements

From the Element object, you can get the XML tag name using the tag property.

print root.tag    # outputs "catalog"

Get a list of child Elements from any Element object using element.findall(‘*’). You can find specific children by name by passing the name of the tag e.g. element.findall(‘book’).

For example, the following code recursively processes all the elements in the XML and prints the name of the tag.

def show(elem):
    print elem.tag
    for child in elem.findall('*'):
        show(child)

show(root)

The above code can be modified to show nicely indented output of the tag names:

def show(elem, indent = 0);
    print ' ' * indent + elem.tag
    for child in elem.findall('*'):
         show(elem, indent + 1)

show(root)

To find a single element by name, use elem.find(tagName):

print root.find('book').tag    # prints "book"

5. Working with Attributes

XML attributes can be extracted from an Element object using the element.items() method which returns a sequence of name, value pairs. (The name-value pairs are returned in random order, not in the order they appear in the XML.)

for attrName, attrValue in elem.items():
    print attrName + '=' + attrValue

To retrieve a single attribute value by name, use the elem.get(attrName) method:

print root.find('book').get('id')    # prints "bk101"

Get a list of all attribute names defined on the element using elem.keys(). Returns an empty list if no attributes are defined.

print root.find('book').keys()
# prints ['id']

To get all the attributes as a python dictionary, use the elem.attrib property:

print root.find('book').attrib
# prints {'id': 'bk101'}

6. Retrieving Element Text Content

You can retrieve an element’s text content using the elem.text property as follows:

print ET.fromstring('<a>Hello<b>1</b>2</a>').text
# prints "Hello"

print ET.fromstring('<a>Hello<b>1</b>2</a>').find('b').text
#prints "1"

Text appearing after the element’s end tag is retrieved using elem.tail property:

print ET.fromstring('<a>Hello<b>1</b>2</a>').find('b').tail
# prints "2"

7. Using Path to Extract Content

Some of the Element object methods support extracting content by using a syntax similar to XPath:

Retrieve a descendant element:

print root.find('book/author').text
# prints "Gambardella, Matthew"

To obtain the text of the first matching element, use the elem.findtext() method as follows:

print root.findtext('book/author')
#prints "Gambardella, Matthew"

Retrieve and process a list of matching elements using elem.findall():

for e in root.findall('book/author'):
    print e.text

# prints the following
Gambardella, Matthew
Ralls, Kim
Corets, Eva
Corets, Eva
Corets, Eva
Randall, Cynthia
Thurman, Paula
Knorr, Stefan
Kress, Peter
O'Brien, Tim
O'Brien, Tim
Galos, Mike

Find a specific element by position. (Position indexes start with 1).

print root.find('book[2]/author').text
# prints "Ralls, Kim"

Here is an example to find and concatenate all text content using a reduce operation:

reduce(lambda x, y: x + '|' + y.text, root.findall("book/author"), '')

# prints "|Gambardella, Matthew|Ralls, Kim|Corets, Eva|Corets, Eva|Corets, Eva|Randall, Cynthia|Thurman, Paula|Knorr, Stefan|Kress, Peter|O'Brien, Tim|O'Brien, Tim|Galos, Mike"

Conclusion

This article demonstrated some aspects of parsing XML with python. We showed how to parse an XML file or an XML string and extract elements, attributes and text content.

See Also