Itertools Recipe: N at a Time
The itertools
docs has a ton of slick recipes for using the library to good effect. Some of the code is more useful than illustrative, so I wanted to use these notebooks to break down a few of the functions.
This is
# poor import style, but I want to copy-paste the code
# as-is from the docs
from itertools import *
import itertools
n_at_a_time()
Note: The docs call this function grouper()
but I think this name is a bit clearer
def n_at_a_time(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# n_at_a_time('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return zip_longest(*args, fillvalue=fillvalue)
Demo
list(n_at_a_time('ABCDEFG', 3, 'x'))
[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'x', 'x')]
list(n_at_a_time(range(10), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
list(n_at_a_time('a'*15, 2, 'b'))
[('a', 'a'),
('a', 'a'),
('a', 'a'),
('a', 'a'),
('a', 'a'),
('a', 'a'),
('a', 'a'),
('a', 'b')]
Why this works
Okay, this one’s devilishly clever.
Same Object
Understanding what’s going on hinges on remembering that without using the itertools.tee()
function, every time you use the assignment operator on an iterable you’re referencing the same object
For instance, say we’ve got a simple list of numbers. We turn this into a list_iterator
and replicate it 4 times.
iterable = [1, 2, 3]
args = [iter(iterable)] * 4
args
[<list_iterator at 0x16a18bfcfd0>,
<list_iterator at 0x16a18bfcfd0>,
<list_iterator at 0x16a18bfcfd0>,
<list_iterator at 0x16a18bfcfd0>]
It should be obvious looking at the memory location in the __repr__
, but for clarity, we can check if each element of the list references the exact same object as the other
args[0] is args[1]
True
args[2] is args[3]
True
Clever use of zip_longest()
The real magic happens when we pass args
into zip_longest()
with the *
operator.
Basically, this means “unpack and zip the first value of each of these 4 iterators… then the second… and so on””
But because each of these iterators reference the exact same iterator, and thereby the exact same reference point in our original iterable, unpacking 4 times to stuff into the zip_longest()
function involves– you guessed it– 4 calls of next()
to the underlying iterable.
This continues on until we get the StopIterationException
, at which point zip_longest()
will fill the remaining zip()
result with whatever we passed for fillvalue