Flattening nested iterable containers.
Sources
- https://docs.python.org/3/library/itertools.html#itertools.chain
- https://stackoverflow.com/questions/2158395/flatten-an-irregular-arbitrarily-nested-list-of-lists
- http://simeonvisser.com/posts/python-3-using-yield-from-in-generators-part-1.html
- https://docs.python.org/3/library/collections.abc.html
- https://stackoverflow.com/questions/120886/python-idiom-to-chain-flatten-an-infinite-iterable-of-finite-iterables
- https://iteration-utilities.readthedocs.io/en/latest/index.html
- https://stackoverflow.com/questions/406121/flattening-a-shallow-list-in-python
The itertools builtin module
The itertools python module takes care of this. Two methods help (Explanations for the methods are taken from the official Python docs, see source 1):
chain(): Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhaustedchain.from_iterable(): Alternate constructor for chain(). Gets chained inputs from a single iterable argument that is evaluated lazily.
Example for itertools.chain()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables = [iterable1, iterable2]
target = list(itertools.chain(*iterables))
print(target)
['a', 'b', 'c', 'd', 'e', 'f']
print(list(itertools.chain(iterables)))
['abc', 'def']
Example for itertools.chain.from_iterable()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables = [iterable1, iterable2]
target = list(itertools.chain.from_iterable(iterables))
print(target)
['a', 'b', 'c', 'd', 'e', 'f']
Notice 1
The last example hints towards the equivalence of itertools.chain.from_iterable(iterables) and itertools.chain(*iterables)
Considering itertools.chain.() and itertools.chain.from_iterable()
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables1 = [iterable1, iterable2]
iterables2 = [iterable1, iterable2]
iterables3 = [iterables1, iterables2]
print(f'iterables3 is {iterables3}')
target = list(itertools.chain.from_iterable(iterables3))
print(target)
iterables3 is [['abc', 'def'], ['abc', 'def']]
['abc', 'def', 'abc', 'def']
import itertools
iterable1 = 'abc'
iterable2 = 'def'
iterables1 = [iterable1, iterable2]
iterables2 = [iterable1, iterable2]
iterables3 = [iterables1, iterables2]
print(f'iterables3 is {iterables3}')
target = list(itertools.chain(*iterables3))
print(target)
iterables3 is [['abc', 'def'], ['abc', 'def']]
['abc', 'def', 'abc', 'def']
Notice 2
This seems to strengthen the equivalence of itertools.chain.from_iterable(iterables) and itertools.chain(*iterables)
Insight 1
It seems that itertools.chain.from_iterable(iterables) and itertools.chain(*iterables) both flatten exactly one level off an iterable.
What happens when we want to use flattening all the way to the bottom?
Flattening to the bottom
Problem description
We have:
- A list of iterables.
- A function that takes this list and flattens it 1 time.
- We can still have iterables as output from step 2.
We want to find out how to gain flattening until no iterables are present anymore. In other words, we require a full decomposition of a possibly hierarchical iterable container.
This problem can be generalized:
- Say we have a given function that we know works
1time. - We want it to work an arbitrary number of times
Depth and Homogeneity
- There’s a concept of “depth” to the number of times the function can execute: If the function flattens one level of an iterable, the resulting components might not be iterables. Such is the case with the iterable
[1,2,3]- After flattening, we get1 2 3. I don’t think this applies to strings in Python3, since in Python3, single characters are considered strings. - We do not assume homogeneity. In other words, if the iterable has a member which is an iterable of depth
1(as in[1,2,3]), then all members in the iterable are also iterable of depth1. This makes the encompassing iterable an iterable of depth2. We do not assume this as this seems to be redundant from the found solutions. Using list comprehensions, as discussed in source [7] only works for one level.
Checking Distinction
Is there a difference between a single character and a string in Python3?
# No distinction between single character strings and strings in Python3
# Both are strings, and hence - iterables.
depth_1_iterable_1 = "abc"
target = list(itertools.chain(*depth_1_iterable_1))
print(target)
depth_1_iterable_2 = "a"
target = list(itertools.chain(depth_1_iterable_2))
print(target)
['a', 'b', 'c']
['a']
# No distinction between single character strings and strings in Python3
# Both are strings, and hence - iterable.
depth_1_iterable_1 = [0,1,2]
target = list(itertools.chain(depth_1_iterable_1))
print(target)
depth_1_iterable_2 = 0
# The following line results in an error: `TypeError: 'int' object is not iterable`
# target = list(itertools.chain(depth_1_iterable_2))
# print(target)
[0, 1, 2]
Solution
This is described in sources [2], [3], [4]
from collections.abc import Iterable
def flatten(xs):
for x in xs:
if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
yield from flatten(x)
else:
yield x
sequence = [[[1, 2, 3], [4, 5]], 6]
target = list(flatten(sequence))
print(target)
[1, 2, 3, 4, 5, 6]
sequence = [[["abc", "b", 3], [4, 5]], 6]
target = list(flatten(sequence))
print(target)
['abc', 'b', 3, 4, 5, 6]
Notice 3
Here we still have "abc" as a single element.
We can try and decompose it as well:
Attempt 1
from collections.abc import Iterable
import itertools
def flatten_with_str(xs):
for x in xs:
if isinstance(x, Iterable):
yield from flatten_with_str(x)
else:
yield x
sequence1 = "abc"
sequence2 = ["abc"]
sequence3 = [[["abc", "b", 3], [4, 5]], 6]
# Will cause crash
# target1 = list(flatten_with_str(sequence1))
# target2 = list(flatten_with_str(sequence2))
# target3 = list(flatten_with_str(sequence))
# print(target1)
# print(target2)
# print(target3)
Notice 4
These cause the Kernel to crash.
I think this is because of never ending recursion: A single character is still as string and thus still an iterable
Attempt 2
from collections.abc import Iterable
import itertools
def flatten_with_str(xs):
for x in xs:
if isinstance(x, Iterable):
if isinstance(x, (str, bytes)) and len(x) == 1:
yield x
else:
yield from flatten_with_str(x)
else:
yield x
sequence1 = "abc"
sequence2 = ["abc"]
sequence3 = [[["abc", "b", 3], [4, 5]], 6]
target1 = list(flatten_with_str(sequence1))
target2 = list(flatten_with_str(sequence2))
target3 = list(flatten_with_str(sequence3))
print(target1)
print(target2)
print(target3)
['a', 'b', 'c']
['a', 'b', 'c']
['a', 'b', 'c', 'b', 3, 4, 5, 6]
More details
Here’s a link to an interesting 3rd party library that deals with flattening and also grouping!
Source [6]: https://iteration-utilities.readthedocs.io/en/latest/index.html