Itertools Recipe: N at a Time

The itertools docs has a ton of slick recipes for using the library to good effect. Some of the code is more useful than illustrative, so I wanted to use these notebooks to break down a few of the functions.

This is

# poor import style, but I want to copy-paste the code
# as-is from the docs

from itertools import *
import itertools

n_at_a_time()

Note: The docs call this function grouper() but I think this name is a bit clearer

def n_at_a_time(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # n_at_a_time('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Demo

list(n_at_a_time('ABCDEFG', 3, 'x'))
[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'x', 'x')]
list(n_at_a_time(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
list(n_at_a_time('a'*15, 2, 'b'))
[('a', 'a'),
 ('a', 'a'),
 ('a', 'a'),
 ('a', 'a'),
 ('a', 'a'),
 ('a', 'a'),
 ('a', 'a'),
 ('a', 'b')]

Why this works

Okay, this one’s devilishly clever.

Same Object

Understanding what’s going on hinges on remembering that without using the itertools.tee() function, every time you use the assignment operator on an iterable you’re referencing the same object

For instance, say we’ve got a simple list of numbers. We turn this into a list_iterator and replicate it 4 times.

iterable = [1, 2, 3]

args = [iter(iterable)] * 4
args
[<list_iterator at 0x16a18bfcfd0>,
 <list_iterator at 0x16a18bfcfd0>,
 <list_iterator at 0x16a18bfcfd0>,
 <list_iterator at 0x16a18bfcfd0>]

It should be obvious looking at the memory location in the __repr__, but for clarity, we can check if each element of the list references the exact same object as the other

args[0] is args[1]
True
args[2] is args[3]
True

Clever use of zip_longest()

The real magic happens when we pass args into zip_longest() with the * operator.

Basically, this means “unpack and zip the first value of each of these 4 iterators… then the second… and so on””

But because each of these iterators reference the exact same iterator, and thereby the exact same reference point in our original iterable, unpacking 4 times to stuff into the zip_longest() function involves– you guessed it– 4 calls of next() to the underlying iterable.

This continues on until we get the StopIterationException, at which point zip_longest() will fill the remaining zip() result with whatever we passed for fillvalue