Itertools Building Blocks

24 May 2019

There are a lot of goodies included in the itertools library to enable clever functional programming

import itertools

print(dir(itertools))

['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_grouper', '_tee', '_tee_dataobject', 'accumulate', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'filterfalse', 'groupby', 'islice', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee', 'zip_longest']

Let’s look at a few

Infinites

`count`

Basically works like a cheap enumerate()

for _, count_val in zip(range(100), itertools.count()):
    pass

print(count_val)

`repeat`

Is used to serve up the same value until an end condition is reached

for _, val in zip(range(16), itertools.repeat('na')):
    print(val, end=' ')
else:
    print('Batman')

na na na na na na na na na na na na na na na na Batman

`cycle`

Used to iterate endlessly through some series of values until an end condition is reached

Works for letters in a string

roflcopter = itertools.cycle('soi')

for _ in range(1000):
    print(next(roflcopter), end='')

soisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisois

Or values in a list

looney_tunes = itertools.cycle(['Rabbit Season', 'Duck Season'])

for _ in range(7):
    print(next(looney_tunes))
else:
    print('Rabbit Season\nDuck Season, FIRE!')

Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Rabbit Season
Duck Season, FIRE!

Boolean Tools

`filter` and `compress`

Work very similarly. To illustrate, let’s make a simple function that scans if a letter is a vowel or not

is_vowel = lambda x: x in {'A', 'E', 'I', 'O', 'U'}

And make a list of all letters

letters = [chr(x+65) for x in range(26)]
print(letters)

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']

The compress() function takes an iterable, and then an iterable of type bool. It retuns an iterable.

compress_result = itertools.compress(letters, map(is_vowel, letters))
compress_result

<itertools.compress at 0x274f3a3c8d0>

Same goes for the stdlib filter function. But the syntax is (evaluation_function, iterable)

filter_result = filter(is_vowel, letters)
filter_result

<filter at 0x274f3a3cc50>

Unpacking both of them into a list, we get the same result

print(list(filter_result))
print(list(compress_result))

['A', 'E', 'I', 'O', 'U']
['A', 'E', 'I', 'O', 'U']

Alternatively, we get the complement letter set using itertools.filterfalse() and the same syntax as filter

print(list(itertools.filterfalse(is_vowel, letters)))

['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y', 'Z']

It’s still unclear to me why somoene would want to use compress() over filter(). Would love an opened issue if anyone has any thoughts…

Short-Circuit Evaluation

`takewhile`

Goes through an iterable, returning values until some condition is met

circle = ['Duck'] * 10 + ['Goose'] + ['Duck'] * 5

for child in itertools.takewhile(lambda x: x != 'Goose', circle):
    print(child)
print('Goose!')

Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Goose!

`dropwhile`

On the other hand, dropwhile() will skip over values until some condition is met, then will return the rest of the iterable.

In this overly-cute example, we forgot that the alphabet starts at 65, but we’re pretty sure it’s within the first 100 characters.

from itertools import dropwhile, islice

buncha_ascii = [chr(x) for x in range(100)]
print(buncha_ascii)

['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c']

And so we condition on the value being a letter

is_a_letter = lambda x: not x.isalpha()

And drop everything until we find the letter ‘A’, then use islice() to grab 26 letters

for i in islice(dropwhile(is_a_letter, buncha_ascii), 26):
    print(i, end=' ')

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Combinatorics

Several handy functions are implemented to support exhaustive combinatorics

Consider two dummy lists.

letters = ['A', 'B', 'C']
numbers = [1, 2, 3]

`product`

If we want to know how many possible pairings we can get with both lists, we want the cartesian product.

for pair in itertools.product(letters, numbers):
    print(pair)

('A', 1)
('A', 2)
('A', 3)
('B', 1)
('B', 2)
('B', 3)
('C', 1)
('C', 2)
('C', 3)

`combinations`

Or just considering potential “handshake pairs” from the first list

for pair in itertools.combinations(letters, 2):
    print(pair)

('A', 'B')
('A', 'C')
('B', 'C')

`combinations_with_replacement`

Or allowing for self-shakes

for pair in itertools.combinations_with_replacement(letters, 2):
    print(pair)

('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')

`permutations`

Finally, we can see all possible orderings of an iterable

for ordering in itertools.permutations(letters):
    print(ordering)

('A', 'B', 'C')
('A', 'C', 'B')
('B', 'A', 'C')
('B', 'C', 'A')
('C', 'A', 'B')
('C', 'B', 'A')

`zip_longest`

This one’s pretty straight-forward. Whereas the usual zip() function kicks out as soon as one of the iterators hits its StopIteratorException, this will continue until the last one does, subbing in whatever you pass to the fillvalue argument, where appropriate

years = range(2015, 2020)
teams = ['Warriors', 'Cavs', 'Warriors', 'Warriors']

for pair in itertools.zip_longest(years, teams, fillvalue='TBD'):
    print(pair)

(2015, 'Warriors')
(2016, 'Cavs')
(2017, 'Warriors')
(2018, 'Warriors')
(2019, 'TBD')

Reusing Iterables with `tee`

This one took me a minute to get the hang of.

Suppose we knew we would want to go through the list zero_to_sixty twice

zero_to_sixty = iter(range(0, 61))

And so we make an appropriately-named iterator that starts at the same point

zero_to_sixty_again = zero_to_sixty

And march through the original iterator a few times

next(zero_to_sixty)

next(zero_to_sixty)

next(zero_to_sixty)

Then we want to fire off the second iterator

next(zero_to_sixty_again)

But it picks up where the last one left off

next(zero_to_sixty)

Indeed, they’re both the exact same object

zero_to_sixty is zero_to_sixty_again

True

`tee`

Instead, we could have accomplished this using the tee() function, which “tees up” an iterator to serve up the same values as its argument would have

zero_to_sixty = iter(range(0, 61))
zero_to_sixty, zero_to_sixty_again = itertools.tee(zero_to_sixty)

Note: tee returns a tuple that looks like (the original iterable, the teed iterable)

for i in range(10):
    next(zero_to_sixty)
print(next(zero_to_sixty))

print(next(zero_to_sixty_again))

Smart iteration with `groupby`

As soon as I stopped thinking about this one in terms of pandas.groupby() and instead lazy iterator construction, it finally clicked.

messy_str = 'aaabbbaaabbbaaabbbcccaaabbbcccdddaaa'

This string is constructed to illustrate the general idea of how this function works. In pseudocode:

1. Take the first value as a key
2. Make this key the first value of an iterator

3. while the next value is the exact same:
     a. add this value to the iterator

4. When you encounter a new value, repeat steps 1-3
5. Rinse, repeat until you get the StopIteration exception

Running it on our example string

g = itertools.groupby(messy_str)
list(g)

[('a', <itertools._grouper at 0x274f3a75f60>),
 ('b', <itertools._grouper at 0x274f3a7a048>),
 ('a', <itertools._grouper at 0x274f3a7a080>),
 ('b', <itertools._grouper at 0x274f3a7a0b8>),
 ('a', <itertools._grouper at 0x274f3a7a0f0>),
 ('b', <itertools._grouper at 0x274f3a7a128>),
 ('c', <itertools._grouper at 0x274f3a7a160>),
 ('a', <itertools._grouper at 0x274f3a7a198>),
 ('b', <itertools._grouper at 0x274f3a7a1d0>),
 ('c', <itertools._grouper at 0x274f3a7a208>),
 ('d', <itertools._grouper at 0x274f3a7a240>),
 ('a', <itertools._grouper at 0x274f3a7a278>)]

The second value in each tuple is, itself, an iterator that yields all of the same values, in order.

This becomes abundantly clear when we then iterate through that second iterator

g = itertools.groupby(messy_str)
for key, vals in g:
    for val in vals:
        print(val, end='')
    print()

aaa
bbb
aaa
bbb
aaa
bbb
ccc
aaa
bbb
ccc
ddd
aaa

Infinites

count

repeat

cycle

Boolean Tools

filter and compress

Short-Circuit Evaluation

takewhile

dropwhile

Combinatorics

product

combinations

combinations_with_replacement

permutations

zip_longest

Reusing Iterables with tee

tee

Smart iteration with groupby