Itertools Building Blocks
There are a lot of goodies included in the itertools
library to enable clever functional programming
import itertools
print(dir(itertools))
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_grouper', '_tee', '_tee_dataobject', 'accumulate', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'filterfalse', 'groupby', 'islice', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee', 'zip_longest']
Let’s look at a few
Infinites
count
Basically works like a cheap enumerate()
for _, count_val in zip(range(100), itertools.count()):
pass
print(count_val)
99
repeat
Is used to serve up the same value until an end condition is reached
for _, val in zip(range(16), itertools.repeat('na')):
print(val, end=' ')
else:
print('Batman')
na na na na na na na na na na na na na na na na Batman
cycle
Used to iterate endlessly through some series of values until an end condition is reached
Works for letters in a string
roflcopter = itertools.cycle('soi')
for _ in range(1000):
print(next(roflcopter), end='')
soisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisois
Or values in a list
looney_tunes = itertools.cycle(['Rabbit Season', 'Duck Season'])
for _ in range(7):
print(next(looney_tunes))
else:
print('Rabbit Season\nDuck Season, FIRE!')
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Rabbit Season
Duck Season, FIRE!
Boolean Tools
filter
and compress
Work very similarly. To illustrate, let’s make a simple function that scans if a letter is a vowel or not
is_vowel = lambda x: x in {'A', 'E', 'I', 'O', 'U'}
And make a list of all letters
letters = [chr(x+65) for x in range(26)]
print(letters)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
The compress()
function takes an iterable, and then an iterable of type bool
. It retuns an iterable.
compress_result = itertools.compress(letters, map(is_vowel, letters))
compress_result
<itertools.compress at 0x274f3a3c8d0>
Same goes for the stdlib filter
function. But the syntax is (evaluation_function, iterable)
filter_result = filter(is_vowel, letters)
filter_result
<filter at 0x274f3a3cc50>
Unpacking both of them into a list, we get the same result
print(list(filter_result))
print(list(compress_result))
['A', 'E', 'I', 'O', 'U']
['A', 'E', 'I', 'O', 'U']
Alternatively, we get the complement letter set using itertools.filterfalse()
and the same syntax as filter
print(list(itertools.filterfalse(is_vowel, letters)))
['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y', 'Z']
It’s still unclear to me why somoene would want to use compress()
over filter()
. Would love an opened issue if anyone has any thoughts…
Short-Circuit Evaluation
takewhile
Goes through an iterable, returning values until some condition is met
circle = ['Duck'] * 10 + ['Goose'] + ['Duck'] * 5
for child in itertools.takewhile(lambda x: x != 'Goose', circle):
print(child)
print('Goose!')
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Goose!
dropwhile
On the other hand, dropwhile()
will skip over values until some condition is met, then will return the rest of the iterable.
In this overly-cute example, we forgot that the alphabet starts at 65, but we’re pretty sure it’s within the first 100 characters.
from itertools import dropwhile, islice
buncha_ascii = [chr(x) for x in range(100)]
print(buncha_ascii)
['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c']
And so we condition on the value being a letter
is_a_letter = lambda x: not x.isalpha()
And drop everything until we find the letter ‘A’, then use islice()
to grab 26 letters
for i in islice(dropwhile(is_a_letter, buncha_ascii), 26):
print(i, end=' ')
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Combinatorics
Several handy functions are implemented to support exhaustive combinatorics
Consider two dummy lists.
letters = ['A', 'B', 'C']
numbers = [1, 2, 3]
product
If we want to know how many possible pairings we can get with both lists, we want the cartesian product.
for pair in itertools.product(letters, numbers):
print(pair)
('A', 1)
('A', 2)
('A', 3)
('B', 1)
('B', 2)
('B', 3)
('C', 1)
('C', 2)
('C', 3)
combinations
Or just considering potential “handshake pairs” from the first list
for pair in itertools.combinations(letters, 2):
print(pair)
('A', 'B')
('A', 'C')
('B', 'C')
combinations_with_replacement
Or allowing for self-shakes
for pair in itertools.combinations_with_replacement(letters, 2):
print(pair)
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
permutations
Finally, we can see all possible orderings of an iterable
for ordering in itertools.permutations(letters):
print(ordering)
('A', 'B', 'C')
('A', 'C', 'B')
('B', 'A', 'C')
('B', 'C', 'A')
('C', 'A', 'B')
('C', 'B', 'A')
zip_longest
This one’s pretty straight-forward. Whereas the usual zip()
function kicks out as soon as one of the iterators hits its StopIteratorException
, this will continue until the last one does, subbing in whatever you pass to the fillvalue
argument, where appropriate
years = range(2015, 2020)
teams = ['Warriors', 'Cavs', 'Warriors', 'Warriors']
for pair in itertools.zip_longest(years, teams, fillvalue='TBD'):
print(pair)
(2015, 'Warriors')
(2016, 'Cavs')
(2017, 'Warriors')
(2018, 'Warriors')
(2019, 'TBD')
Reusing Iterables with tee
This one took me a minute to get the hang of.
Suppose we knew we would want to go through the list zero_to_sixty
twice
zero_to_sixty = iter(range(0, 61))
And so we make an appropriately-named iterator that starts at the same point
zero_to_sixty_again = zero_to_sixty
And march through the original iterator a few times
next(zero_to_sixty)
0
next(zero_to_sixty)
1
next(zero_to_sixty)
2
Then we want to fire off the second iterator
next(zero_to_sixty_again)
3
But it picks up where the last one left off
next(zero_to_sixty)
4
Indeed, they’re both the exact same object
zero_to_sixty is zero_to_sixty_again
True
tee
Instead, we could have accomplished this using the tee()
function, which “tees up” an iterator to serve up the same values as its argument would have
zero_to_sixty = iter(range(0, 61))
zero_to_sixty, zero_to_sixty_again = itertools.tee(zero_to_sixty)
Note: tee
returns a tuple that looks like (the original iterable, the teed iterable)
for i in range(10):
next(zero_to_sixty)
print(next(zero_to_sixty))
10
print(next(zero_to_sixty_again))
0
Smart iteration with groupby
As soon as I stopped thinking about this one in terms of pandas.groupby()
and instead lazy iterator construction, it finally clicked.
messy_str = 'aaabbbaaabbbaaabbbcccaaabbbcccdddaaa'
This string is constructed to illustrate the general idea of how this function works. In pseudocode:
1. Take the first value as a key
2. Make this key the first value of an iterator
3. while the next value is the exact same:
a. add this value to the iterator
4. When you encounter a new value, repeat steps 1-3
5. Rinse, repeat until you get the StopIteration exception
Running it on our example string
g = itertools.groupby(messy_str)
list(g)
[('a', <itertools._grouper at 0x274f3a75f60>),
('b', <itertools._grouper at 0x274f3a7a048>),
('a', <itertools._grouper at 0x274f3a7a080>),
('b', <itertools._grouper at 0x274f3a7a0b8>),
('a', <itertools._grouper at 0x274f3a7a0f0>),
('b', <itertools._grouper at 0x274f3a7a128>),
('c', <itertools._grouper at 0x274f3a7a160>),
('a', <itertools._grouper at 0x274f3a7a198>),
('b', <itertools._grouper at 0x274f3a7a1d0>),
('c', <itertools._grouper at 0x274f3a7a208>),
('d', <itertools._grouper at 0x274f3a7a240>),
('a', <itertools._grouper at 0x274f3a7a278>)]
The second value in each tuple is, itself, an iterator that yields all of the same values, in order.
This becomes abundantly clear when we then iterate through that second iterator
g = itertools.groupby(messy_str)
for key, vals in g:
for val in vals:
print(val, end='')
print()
aaa
bbb
aaa
bbb
aaa
bbb
ccc
aaa
bbb
ccc
ddd
aaa