# Itertools Building Blocks

There are a lot of goodies included in the `itertools`

library to enable clever functional programming

```
import itertools
print(dir(itertools))
```

```
['__doc__', '__loader__', '__name__', '__package__', '__spec__', '_grouper', '_tee', '_tee_dataobject', 'accumulate', 'chain', 'combinations', 'combinations_with_replacement', 'compress', 'count', 'cycle', 'dropwhile', 'filterfalse', 'groupby', 'islice', 'permutations', 'product', 'repeat', 'starmap', 'takewhile', 'tee', 'zip_longest']
```

Let’s look at a few

## Infinites

`count`

Basically works like a cheap `enumerate()`

```
for _, count_val in zip(range(100), itertools.count()):
pass
print(count_val)
```

```
99
```

`repeat`

Is used to serve up the same value until an end condition is reached

```
for _, val in zip(range(16), itertools.repeat('na')):
print(val, end=' ')
else:
print('Batman')
```

```
na na na na na na na na na na na na na na na na Batman
```

`cycle`

Used to iterate endlessly through some series of values until an end condition is reached

Works for letters in a string

```
roflcopter = itertools.cycle('soi')
for _ in range(1000):
print(next(roflcopter), end='')
```

```
soisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisoisois
```

Or values in a list

```
looney_tunes = itertools.cycle(['Rabbit Season', 'Duck Season'])
for _ in range(7):
print(next(looney_tunes))
else:
print('Rabbit Season\nDuck Season, FIRE!')
```

```
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Duck Season
Rabbit Season
Rabbit Season
Duck Season, FIRE!
```

## Boolean Tools

`filter`

and `compress`

Work very similarly. To illustrate, let’s make a simple function that scans if a letter is a vowel or not

`is_vowel = lambda x: x in {'A', 'E', 'I', 'O', 'U'}`

And make a list of all letters

```
letters = [chr(x+65) for x in range(26)]
print(letters)
```

```
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
```

The `compress()`

function takes an iterable, and then *an iterable of type* `bool`

. It retuns an iterable.

```
compress_result = itertools.compress(letters, map(is_vowel, letters))
compress_result
```

```
<itertools.compress at 0x274f3a3c8d0>
```

Same goes for the stdlib `filter`

function. But the syntax is (evaluation_function, iterable)

```
filter_result = filter(is_vowel, letters)
filter_result
```

```
<filter at 0x274f3a3cc50>
```

Unpacking both of them into a list, we get the same result

```
print(list(filter_result))
print(list(compress_result))
```

```
['A', 'E', 'I', 'O', 'U']
['A', 'E', 'I', 'O', 'U']
```

Alternatively, we get the complement letter set using `itertools.filterfalse()`

and the same syntax as `filter`

`print(list(itertools.filterfalse(is_vowel, letters)))`

```
['B', 'C', 'D', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'V', 'W', 'X', 'Y', 'Z']
```

It’s still unclear to me why somoene would want to use `compress()`

over `filter()`

. Would love an opened issue if anyone has any thoughts…

## Short-Circuit Evaluation

`takewhile`

Goes through an iterable, returning values until some condition is met

```
circle = ['Duck'] * 10 + ['Goose'] + ['Duck'] * 5
for child in itertools.takewhile(lambda x: x != 'Goose', circle):
print(child)
print('Goose!')
```

```
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Duck
Goose!
```

`dropwhile`

On the other hand, `dropwhile()`

will *skip over values* until some condition is met, then will return the rest of the iterable.

In this overly-cute example, we forgot that the alphabet starts at 65, but we’re *pretty sure* it’s within the first 100 characters.

```
from itertools import dropwhile, islice
buncha_ascii = [chr(x) for x in range(100)]
print(buncha_ascii)
```

```
['\x00', '\x01', '\x02', '\x03', '\x04', '\x05', '\x06', '\x07', '\x08', '\t', '\n', '\x0b', '\x0c', '\r', '\x0e', '\x0f', '\x10', '\x11', '\x12', '\x13', '\x14', '\x15', '\x16', '\x17', '\x18', '\x19', '\x1a', '\x1b', '\x1c', '\x1d', '\x1e', '\x1f', ' ', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '[', '\\', ']', '^', '_', '`', 'a', 'b', 'c']
```

And so we condition on the value being a letter

`is_a_letter = lambda x: not x.isalpha()`

And drop everything until we find the letter ‘A’, then use `islice()`

to grab 26 letters

```
for i in islice(dropwhile(is_a_letter, buncha_ascii), 26):
print(i, end=' ')
```

```
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
```

## Combinatorics

Several handy functions are implemented to support exhaustive combinatorics

Consider two dummy lists.

```
letters = ['A', 'B', 'C']
numbers = [1, 2, 3]
```

`product`

If we want to know how many possible pairings we can get with both lists, we want the cartesian product.

```
for pair in itertools.product(letters, numbers):
print(pair)
```

```
('A', 1)
('A', 2)
('A', 3)
('B', 1)
('B', 2)
('B', 3)
('C', 1)
('C', 2)
('C', 3)
```

`combinations`

Or just considering potential “handshake pairs” from the first list

```
for pair in itertools.combinations(letters, 2):
print(pair)
```

```
('A', 'B')
('A', 'C')
('B', 'C')
```

`combinations_with_replacement`

Or allowing for self-shakes

```
for pair in itertools.combinations_with_replacement(letters, 2):
print(pair)
```

```
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
```

`permutations`

Finally, we can see all possible orderings of an iterable

```
for ordering in itertools.permutations(letters):
print(ordering)
```

```
('A', 'B', 'C')
('A', 'C', 'B')
('B', 'A', 'C')
('B', 'C', 'A')
('C', 'A', 'B')
('C', 'B', 'A')
```

`zip_longest`

This one’s pretty straight-forward. Whereas the usual `zip()`

function kicks out as soon as one of the iterators hits its `StopIteratorException`

, this will continue *until the last one does*, subbing in whatever you pass to the `fillvalue`

argument, where appropriate

```
years = range(2015, 2020)
teams = ['Warriors', 'Cavs', 'Warriors', 'Warriors']
for pair in itertools.zip_longest(years, teams, fillvalue='TBD'):
print(pair)
```

```
(2015, 'Warriors')
(2016, 'Cavs')
(2017, 'Warriors')
(2018, 'Warriors')
(2019, 'TBD')
```

## Reusing Iterables with `tee`

This one took me a minute to get the hang of.

Suppose we knew we would want to go through the list `zero_to_sixty`

twice

`zero_to_sixty = iter(range(0, 61))`

And so we make an appropriately-named iterator that starts at the same point

`zero_to_sixty_again = zero_to_sixty`

And march through the original iterator a few times

`next(zero_to_sixty)`

```
0
```

`next(zero_to_sixty)`

```
1
```

`next(zero_to_sixty)`

```
2
```

Then we want to fire off the second iterator

`next(zero_to_sixty_again)`

```
3
```

But it picks up where the last one left off

`next(zero_to_sixty)`

```
4
```

Indeed, they’re both the exact same object

`zero_to_sixty is zero_to_sixty_again`

```
True
```

`tee`

Instead, we could have accomplished this using the `tee()`

function, which “tees up” an iterator to serve up the same values as its argument would have

```
zero_to_sixty = iter(range(0, 61))
zero_to_sixty, zero_to_sixty_again = itertools.tee(zero_to_sixty)
```

**Note**: `tee`

returns a tuple that looks like (the original iterable, the teed iterable)

```
for i in range(10):
next(zero_to_sixty)
print(next(zero_to_sixty))
```

```
10
```

`print(next(zero_to_sixty_again))`

```
0
```

## Smart iteration with `groupby`

As soon as I stopped thinking about this one in terms of `pandas.groupby()`

and instead lazy iterator construction, it finally clicked.

`messy_str = 'aaabbbaaabbbaaabbbcccaaabbbcccdddaaa'`

This string is constructed to illustrate the general idea of how this function works. In pseudocode:

```
1. Take the first value as a key
2. Make this key the first value of an iterator
3. while the next value is the exact same:
a. add this value to the iterator
4. When you encounter a new value, repeat steps 1-3
5. Rinse, repeat until you get the StopIteration exception
```

Running it on our example string

```
g = itertools.groupby(messy_str)
list(g)
```

```
[('a', <itertools._grouper at 0x274f3a75f60>),
('b', <itertools._grouper at 0x274f3a7a048>),
('a', <itertools._grouper at 0x274f3a7a080>),
('b', <itertools._grouper at 0x274f3a7a0b8>),
('a', <itertools._grouper at 0x274f3a7a0f0>),
('b', <itertools._grouper at 0x274f3a7a128>),
('c', <itertools._grouper at 0x274f3a7a160>),
('a', <itertools._grouper at 0x274f3a7a198>),
('b', <itertools._grouper at 0x274f3a7a1d0>),
('c', <itertools._grouper at 0x274f3a7a208>),
('d', <itertools._grouper at 0x274f3a7a240>),
('a', <itertools._grouper at 0x274f3a7a278>)]
```

The second value in each tuple is, itself, an iterator that yields all of the same values, in order.

This becomes abundantly clear when we then iterate *through that second iterator*

```
g = itertools.groupby(messy_str)
for key, vals in g:
for val in vals:
print(val, end='')
print()
```

```
aaa
bbb
aaa
bbb
aaa
bbb
ccc
aaa
bbb
ccc
ddd
aaa
```