Python itertools – compress, dropwhile, takewhile, groupby

Boost your python expertise with these examples from itertools.

“Never put off till tomorrow what may be done day after tomorrow just as well.” ― Mark Twain

1. Introduction

We are continuing with our series on python itertools. In this article, we cover compress, dropwhile, takewhile and groupby.

2. What does itertools.compress() do?

Python provides the function itertools.compress() which filters elements from an iterable based on a list of selectors. Here is an example to understand what it does.

print list(compress(['TOY', 'HON', 'GM', 'FRD', 'CHRY'], [0, 0, 1, 1, 0]))
# prints
['GM', 'FRD']

Select characters from a string instead of a list. This works since a string can be viewed as an iterable of its characters.

for x in compress('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [0, 0, 1, 0, 0, 0, 0, 1, 0, 1]):
    print x,
# prints
C H J

To use a string as the second argument:

for x in compress('ABCDEFGHIJKLMNOPQRSTUVWXYZ', map(int, '01101100')):
    print x,
# prints
B C E F

How about a random selection of characters?

for x in compress('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [random.choice([0, 1]) for x in xrange(0, 10)]):
    print x,
# prints
A B H J

Does the selector iterable have to return 0 or 1? No. A non-zero is treated as True and 0 is treated as False.

print list(compress('ABCDEFGHIJKLMNOPQRSTUVWXYZ', map(int, '021024')))
# prints
['B', 'C', 'E', 'F']

Is there a way of doing the same without using compress()? Yes, there is:

print [x[0] for x in zip('ABCDEFGHIJKLMNOPQRSTUVWXYZ', [random.choice([0, 1]) for x in xrange(0, 10)]) if x[1]]
# prints
['C', 'F', 'G', 'H', 'I', 'J']

3. What the heck is dropwhile()?

What is itertools.dropwhile()? According to the python 2.7 documentation, it returns an iterator which drops elements from the second iterable argument while the first predicate argument returns True. A mouthful of words which didn’t make sense to me when I first read it, so let us look at some examples. (BTW, in addition to this cryptic description, there are no examples on the site. Come on guys, throw us a bone here!)

Here is an example which illustrates what it is doing.

def fn(x):
    r = random.choice([True, False])
    print x, '=>', r
    return r

print ''.join(dropwhile(fn, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
# prints
A => True
B => True
C => True
D => False
DEFGHIJKLMNOPQRSTUVWXYZ

The predicate function (first argument) is invoked with an item from the list (or iterable) till it returns False. The rest of the items starting from this item are returned (actually an iterator for these items are returned).

Another way of putting it – the items for which the predicate function returns True are dropped from the iterable.

Here is another example:

c = random.randint(0, 10)
print c, list(dropwhile(lambda x : x < c, xrange(0, 20)))
# prints
8 [8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Drop characters from a string till you hit the first vowel:

print list(dropwhile(lambda c : c.lower() not in "aeiou", 'chrysler'))
# prints
['e', 'r']

4. Some takewhile() Examples

The counterpart of itertools.dropwhile() is itertools.takewhile(). It accepts list elements for which the predicate function returns True.

def fn(x):
    r = random.choice([True, False])
    print x, '=>', r
    return r

print ''.join(dropwhile(fn, 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
# prints
A => True
B => True
C => True
D => True
E => False
ABCD

Select characters from a string till you hit a vowel:

print list(takewhile(lambda c : c.lower() not in "aeiou", 'chrysler'))
# prints
['c', 'h', 'r', 'y', 's', 'l']

Use to split string at first instance of a character:

print list(takewhile(lambda c : c != 'Y', 'TOYOTA'))
print list(dropwhile(lambda c : c != 'Y', 'TOYOTA'))
# prints
['T', 'O']
['Y', 'O', 'T', 'A']

5. Grouping with groupby

Python itertools provides the groupby() function which accepts a sorted list and returns an iterator over keys and groups.

For example, consider this string. It contains several repeated sequences of characters and is sorted. You can use groupby() to group it by the characters.

arr = 'BCDDIIJKNNOOPPPSTTVX'
grps = []
keys = []
for key, grp in groupby(arr):
    grps.append(list(grp))
    keys.append(key)
print ''.join(keys)
print grps
# prints
BCDIJKNOPSTVX
[['B'], ['C'], ['D', 'D'], ['I', 'I'], ['J'], ['K'], ['N', 'N'], ['O', 'O'], ['P', 'P', 'P'], ['S'], ['T', 'T'], ['V'], ['X']]

The groupby() function actually returns an iterator over the pairs (key, group) for each group in the input sequence. The code snippet above shows how to collect the keys and groups separately if required.

Here is an example of binning using the groupby() function. We classify a set of numbers into even and odd below. Note that we have sorted the array using the same key function as that used for grouping.

arr = sorted([random.randint(0, 20) for x in xrange(30)])
f = lambda x : 'even' if x % 2 == 0 else ' odd'
arr = sorted(arr, key=f)
print arr
for k, g in groupby(arr, f):
    print k, list(g)
# prints
[1, 1, 3, 7, 9, 9, 9, 9, 11, 11, 11, 15, 17, 17, 19, 19, 0, 0, 2, 2, 2, 6, 6, 6, 14, 14, 16, 16, 18, 20]
 odd [1, 1, 3, 7, 9, 9, 9, 9, 11, 11, 11, 15, 17, 17, 19, 19]
even [0, 0, 2, 2, 2, 6, 6, 6, 14, 14, 16, 16, 18, 20]

Review

We have covered compress(), dropwhile(), takewhile() and groupby() from the itertools module in this article. compress() accepts an list of selectors and selects elements from an iterable for which the corresponding selector is True. dropwhile() invokes a predicate function with each element from an iterable and returns the rest of the elements when the predicate returns False. takewhile() does the opposite, returning elements from the beginning till the predicate function returns False. groupby() provides SQL-like grouping functionality with iterables.

Here is the previous part of this series on python itertools.