Flattening nested iterable containers.
Sources
- https://docs.python.org/3/library/itertools.html#itertools.chain
- https://stackoverflow.com/questions/2158395/flatten-an-irregular-arbitrarily-nested-list-of-lists
- http://simeonvisser.com/posts/python-3-using-yield-from-in-generators-part-1.html
- https://docs.python.org/3/library/collections.abc.html
- https://stackoverflow.com/questions/120886/python-idiom-to-chain-flatten-an-infinite-iterable-of-finite-iterables
- https://iteration-utilities.readthedocs.io/en/latest/index.html
- https://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python
The itertools
builtin module
The itertools
python module takes care of this. Two methods help (Explanations for the methods are taken from the official Python docs, see source 1):
chain()
: Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhaustedchain.from_iterable()
: Alternate constructor for chain(). Gets chained inputs from a single iterable argument that is evaluated lazily.
Example for itertools.chain()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables = [iterable1, iterable2]
target = list(itertools.chain(*iterables))
print(target)
['a', 'b', 'c', 'd', 'e', 'f']
print(list(itertools.chain(iterables)))
['abc', 'def']
Example for itertools.chain.from_iterable()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables = [iterable1, iterable2]
target = list(itertools.chain.from_iterable(iterables))
print(target)
['a', 'b', 'c', 'd', 'e', 'f']
Notice 1
The last example hints towards the equivalence of itertools.chain.from_iterable(iterables)
and itertools.chain(*iterables)
Considering itertools.chain.()
and itertools.chain.from_iterable()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables1 = [iterable1, iterable2]
iterables2 = [iterable1, iterable2]
iterables3 = [iterables1, iterables2]
print(f'iterables3 is {iterables3}')
target = list(itertools.chain.from_iterable(iterables3))
print(target)
iterables3 is [['abc', 'def'], ['abc', 'def']]
['abc', 'def', 'abc', 'def']
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables1 = [iterable1, iterable2]
iterables2 = [iterable1, iterable2]
iterables3 = [iterables1, iterables2]
print(f'iterables3 is {iterables3}')
target = list(itertools.chain(*iterables3))
print(target)
iterables3 is [['abc', 'def'], ['abc', 'def']]
['abc', 'def', 'abc', 'def']
Notice 2
This seems to strengthen the equivalence of itertools.chain.from_iterable(iterables)
and itertools.chain(*iterables)
Insight 1
It seems that itertools.chain.from_iterable(iterables)
and itertools.chain(*iterables)
both flatten exactly one level off an iterable.
What happens when we want to use flattening all the way to the bottom?
Flattening to the bottom
Problem description
We have:
- A list of iterables.
- A function that takes this list and flattens it 1 time.
- We can still have iterables as output from step 2.
We want to find out how to gain flattening until no iterables are present anymore. In other words, we require a full decomposition of a possibly hierarchical iterable container.
This problem can be generalized:
- Say we have a given function that we know works
1
time. - We want it to work an arbitrary number of times
Depth and Homogeneity
- There’s a concept of “depth” to the number of times the function can execute: If the function flattens one level of an iterable, the resulting components might not be iterables. Such is the case with the iterable
[1,2,3]
- After flattening, we get1 2 3
. I don’t think this applies to strings in Python3, since in Python3, single characters are considered strings. - We do not assume homogeneity. In other words, if the iterable has a member which is an iterable of depth
1
(as in[1,2,3]
), then all members in the iterable are also iterable of depth1
. This makes the encompassing iterable an iterable of depth2
. We do not assume this as this seems to be redundant from the found solutions. Using list comprehensions, as discussed in source [7] only works for one level.
Checking Distinction
Is there a difference between a single character and a string in Python3?
# No distinction between single character strings and strings in Python3
# Both are strings, and hence - iterables.
depth_1_iterable_1 = "abc"
target = list(itertools.chain(*depth_1_iterable_1))
print(target)
depth_1_iterable_2 = "a"
target = list(itertools.chain(depth_1_iterable_2))
print(target)
['a', 'b', 'c']
['a']
# No distinction between single character strings and strings in Python3
# Both are strings, and hence - iterable.
depth_1_iterable_1 = [0,1,2]
target = list(itertools.chain(depth_1_iterable_1))
print(target)
depth_1_iterable_2 = 0
# The following line results in an error: `TypeError: 'int' object is not iterable`
# target = list(itertools.chain(depth_1_iterable_2))
# print(target)
[0, 1, 2]
Solution
This is described in sources [2], [3], [4]
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
sequence = [[[1, 2, 3], [4, 5]], 6]
target = list(flatten(sequence))
print(target)
[1, 2, 3, 4, 5, 6]
sequence = [[["abc", "b", 3], [4, 5]], 6]
target = list(flatten(sequence))
print(target)
['abc', 'b', 3, 4, 5, 6]
Notice 3
Here we still have "abc"
as a single element.
We can try and decompose it as well:
Attempt 1
from collections.abc import Iterable
import itertools
def flatten_with_str(xs):
for x in xs:
if isinstance(x, Iterable):
yield from flatten_with_str(x)
else:
yield x
sequence1 = "abc"
sequence2 = ["abc"]
sequence3 = [[["abc", "b", 3], [4, 5]], 6]
# Will cause crash
# target1 = list(flatten_with_str(sequence1))
# target2 = list(flatten_with_str(sequence2))
# target3 = list(flatten_with_str(sequence))
# print(target1)
# print(target2)
# print(target3)
Notice 4
These cause the Kernel to crash.
I think this is because of never ending recursion: A single character is still as string and thus still an iterable
Attempt 2
from collections.abc import Iterable
import itertools
def flatten_with_str(xs):
for x in xs:
if isinstance(x, Iterable):
if isinstance(x, (str, bytes)) and len(x) == 1:
yield x
else:
yield from flatten_with_str(x)
else:
yield x
sequence1 = "abc"
sequence2 = ["abc"]
sequence3 = [[["abc", "b", 3], [4, 5]], 6]
target1 = list(flatten_with_str(sequence1))
target2 = list(flatten_with_str(sequence2))
target3 = list(flatten_with_str(sequence3))
print(target1)
print(target2)
print(target3)
['a', 'b', 'c']
['a', 'b', 'c']
['a', 'b', 'c', 'b', 3, 4, 5, 6]
More details
Here’s a link to an interesting 3rd party library that deals with flattening and also grouping!
Source [6]: https://iteration-utilities.readthedocs.io/en/latest/index.html