Itertools Building Blocks

There are a lot of goodies included in the itertools library to enable clever functional programming

import itertools

print(dir(itertools))
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_grouper', '_tee', '_tee_dataobject', 'accumulate', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'filterfalse', 'groupby', 'islice', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee', 'zip_longest']

Let’s look at a few

Infinites

count

Basically works like a cheap enumerate()

for _, count_val in zip(range(100), itertools.count()):
    pass

print(count_val)
99

repeat

Is used to serve up the same value until an end condition is reached

for _, val in zip(range(16), itertools.repeat('na')):
    print(val, end=' ')
else:
    print('Batman')    
na na na na na na na na na na na na na na na na Batman

cycle

Used to iterate endlessly through some series of values until an end condition is reached

Works for letters in a string

roflcopter = itertools.cycle('soi')

for _ in range(1000):
    print(next(roflcopter), end='')
soisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisois

Or values in a list

looney_tunes = itertools.cycle(['Rabbit Season', 'Duck Season'])

for _ in range(7):
    print(next(looney_tunes))
else:
    print('Rabbit Season\nDuck Season, FIRE!')
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Rabbit Season
Duck Season, FIRE!

Boolean Tools

filter and compress

Work very similarly. To illustrate, let’s make a simple function that scans if a letter is a vowel or not

is_vowel = lambda x: x in {'A', 'E', 'I', 'O', 'U'}

And make a list of all letters

letters = [chr(x+65) for x in range(26)]
print(letters)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

The compress() function takes an iterable, and then an iterable of type bool. It retuns an iterable.

compress_result = itertools.compress(letters, map(is_vowel, letters))
compress_result
<itertools.compress at 0x274f3a3c8d0>

Same goes for the stdlib filter function. But the syntax is (evaluation_function, iterable)

filter_result = filter(is_vowel, letters)
filter_result
<filter at 0x274f3a3cc50>

Unpacking both of them into a list, we get the same result

print(list(filter_result))
print(list(compress_result))
['A', 'E', 'I', 'O', 'U']
['A', 'E', 'I', 'O', 'U']

Alternatively, we get the complement letter set using itertools.filterfalse() and the same syntax as filter

print(list(itertools.filterfalse(is_vowel, letters)))
['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y', 'Z']

It’s still unclear to me why somoene would want to use compress() over filter(). Would love an opened issue if anyone has any thoughts…

Short-Circuit Evaluation

takewhile

Goes through an iterable, returning values until some condition is met

circle = ['Duck'] * 10 + ['Goose'] + ['Duck'] * 5

for child in itertools.takewhile(lambda x: x != 'Goose', circle):
    print(child)
print('Goose!')
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Goose!

dropwhile

On the other hand, dropwhile() will skip over values until some condition is met, then will return the rest of the iterable.

In this overly-cute example, we forgot that the alphabet starts at 65, but we’re pretty sure it’s within the first 100 characters.

from itertools import dropwhile, islice

buncha_ascii = [chr(x) for x in range(100)]
print(buncha_ascii)
['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c']

And so we condition on the value being a letter

is_a_letter = lambda x: not x.isalpha()

And drop everything until we find the letter ‘A’, then use islice() to grab 26 letters

for i in islice(dropwhile(is_a_letter, buncha_ascii), 26):
    print(i, end=' ')
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 

Combinatorics

Several handy functions are implemented to support exhaustive combinatorics

Consider two dummy lists.

letters = ['A', 'B', 'C']
numbers = [1, 2, 3]

product

If we want to know how many possible pairings we can get with both lists, we want the cartesian product.

for pair in itertools.product(letters, numbers):
    print(pair)
('A', 1)
('A', 2)
('A', 3)
('B', 1)
('B', 2)
('B', 3)
('C', 1)
('C', 2)
('C', 3)

combinations

Or just considering potential “handshake pairs” from the first list

for pair in itertools.combinations(letters, 2):
    print(pair)
('A', 'B')
('A', 'C')
('B', 'C')

combinations_with_replacement

Or allowing for self-shakes

for pair in itertools.combinations_with_replacement(letters, 2):
    print(pair)
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')

permutations

Finally, we can see all possible orderings of an iterable

for ordering in itertools.permutations(letters):
    print(ordering)
('A', 'B', 'C')
('A', 'C', 'B')
('B', 'A', 'C')
('B', 'C', 'A')
('C', 'A', 'B')
('C', 'B', 'A')

zip_longest

This one’s pretty straight-forward. Whereas the usual zip() function kicks out as soon as one of the iterators hits its StopIteratorException, this will continue until the last one does, subbing in whatever you pass to the fillvalue argument, where appropriate

years = range(2015, 2020)
teams = ['Warriors', 'Cavs', 'Warriors', 'Warriors']

for pair in itertools.zip_longest(years, teams, fillvalue='TBD'):
    print(pair)
(2015, 'Warriors')
(2016, 'Cavs')
(2017, 'Warriors')
(2018, 'Warriors')
(2019, 'TBD')

Reusing Iterables with tee

This one took me a minute to get the hang of.

Suppose we knew we would want to go through the list zero_to_sixty twice

zero_to_sixty = iter(range(0, 61))

And so we make an appropriately-named iterator that starts at the same point

zero_to_sixty_again = zero_to_sixty

And march through the original iterator a few times

next(zero_to_sixty)
0
next(zero_to_sixty)
1
next(zero_to_sixty)
2

Then we want to fire off the second iterator

next(zero_to_sixty_again)
3

But it picks up where the last one left off

next(zero_to_sixty)
4

Indeed, they’re both the exact same object

zero_to_sixty is zero_to_sixty_again
True

tee

Instead, we could have accomplished this using the tee() function, which “tees up” an iterator to serve up the same values as its argument would have

zero_to_sixty = iter(range(0, 61))
zero_to_sixty, zero_to_sixty_again = itertools.tee(zero_to_sixty)

Note: tee returns a tuple that looks like (the original iterable, the teed iterable)

for i in range(10):
    next(zero_to_sixty)
print(next(zero_to_sixty))
10
print(next(zero_to_sixty_again))
0

Smart iteration with groupby

As soon as I stopped thinking about this one in terms of pandas.groupby() and instead lazy iterator construction, it finally clicked.

messy_str = 'aaabbbaaabbbaaabbbcccaaabbbcccdddaaa'

This string is constructed to illustrate the general idea of how this function works. In pseudocode:

1. Take the first value as a key
2. Make this key the first value of an iterator

3. while the next value is the exact same:
     a. add this value to the iterator

4. When you encounter a new value, repeat steps 1-3
5. Rinse, repeat until you get the StopIteration exception

Running it on our example string

g = itertools.groupby(messy_str)
list(g)
[('a', <itertools._grouper at 0x274f3a75f60>),
 ('b', <itertools._grouper at 0x274f3a7a048>),
 ('a', <itertools._grouper at 0x274f3a7a080>),
 ('b', <itertools._grouper at 0x274f3a7a0b8>),
 ('a', <itertools._grouper at 0x274f3a7a0f0>),
 ('b', <itertools._grouper at 0x274f3a7a128>),
 ('c', <itertools._grouper at 0x274f3a7a160>),
 ('a', <itertools._grouper at 0x274f3a7a198>),
 ('b', <itertools._grouper at 0x274f3a7a1d0>),
 ('c', <itertools._grouper at 0x274f3a7a208>),
 ('d', <itertools._grouper at 0x274f3a7a240>),
 ('a', <itertools._grouper at 0x274f3a7a278>)]

The second value in each tuple is, itself, an iterator that yields all of the same values, in order.

This becomes abundantly clear when we then iterate through that second iterator

g = itertools.groupby(messy_str)
for key, vals in g:
    for val in vals:
        print(val, end='')
    print()
aaa
bbb
aaa
bbb
aaa
bbb
ccc
aaa
bbb
ccc
ddd
aaa