itertools docs has a ton of slick recipes for using the library to good effect. Some of the code is more useful than illustrative, so I wanted to use these notebooks to break down a few of the functions.
# poor import style, but I want to copy-paste the code # as-is from the docs from itertools import * import itertools
Note: The docs call this function
grouper() but I think this name is a bit clearer
def n_at_a_time(iterable, n, fillvalue=None): "Collect data into fixed-length chunks or blocks" # n_at_a_time('ABCDEFG', 3, 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue)
list(n_at_a_time('ABCDEFG', 3, 'x'))
[('A', 'B', 'C'), ('D', 'E', 'F'), ('G', 'x', 'x')]
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
list(n_at_a_time('a'*15, 2, 'b'))
[('a', 'a'), ('a', 'a'), ('a', 'a'), ('a', 'a'), ('a', 'a'), ('a', 'a'), ('a', 'a'), ('a', 'b')]
Why this works
Okay, this one’s devilishly clever.
Understanding what’s going on hinges on remembering that without using the
itertools.tee() function, every time you use the assignment operator on an iterable you’re referencing the same object
For instance, say we’ve got a simple list of numbers. We turn this into a
list_iterator and replicate it 4 times.
iterable = [1, 2, 3] args = [iter(iterable)] * 4 args
[<list_iterator at 0x16a18bfcfd0>, <list_iterator at 0x16a18bfcfd0>, <list_iterator at 0x16a18bfcfd0>, <list_iterator at 0x16a18bfcfd0>]
It should be obvious looking at the memory location in the
__repr__, but for clarity, we can check if each element of the list references the exact same object as the other
args is args
args is args
Clever use of
The real magic happens when we pass
zip_longest() with the
Basically, this means “unpack and zip the first value of each of these 4 iterators… then the second… and so on””
But because each of these iterators reference the exact same iterator, and thereby the exact same reference point in our original iterable, unpacking 4 times to stuff into the
zip_longest() function involves– you guessed it– 4 calls of
next() to the underlying iterable.
This continues on until we get the
StopIterationException, at which point
zip_longest() will fill the remaining
zip() result with whatever we passed for