Wrapping elements in lists

Sources

Option 1: Stream approach

def condition(item, prev):
    return item - prev <= 15

def grouper_by_stream(iterable, function):
    prev = None
    group = []
    for item in iterable:
        if prev is None or function(item, prev):
            group.append(item)
        else:
            yield group
            group = [item]
        prev = item
    if group:
        yield group

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]
result = dict(enumerate(grouper_by_stream(numbers, condition), 1))

print(result)

{1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]}

numbers = [123, 124, 128, 160, 167, 213, 215, 230, 245, 255, 257, 400, 401, 402, 430]

def cut_indices(numbers):
    # this function iterate over the indices that need to be 'cut'
    for i in range(len(numbers)-1):
        if numbers[i+1] - numbers[i] > 15:
            yield i+1

def splitter(numbers):
    # this function split the original list into sublists.
    px = 0
    for x in cut_indices(numbers):
        yield numbers[px:x]
        px = x
    yield numbers[px:]

def cluster(numbers):
    # using the above result, to form a dict object.
    cluster_ids = range(1,len(numbers))
    return dict(zip(cluster_ids, splitter(numbers)))

result = cluster(numbers)

print(result)

{1: [123, 124, 128], 2: [160, 167], 3: [213, 215, 230, 245, 255, 257], 4: [400, 401, 402], 5: [430]}

Option 2: General Approach

Non-Stream Approach with generator

A non-stream approach would not go hand-in-hand with a generator because a generator traverses iterables element-by-element, which is, by definition, a stream approach.

def grouper_by_general_condition(iterable, function):
    group = []
    for item in iterable:
        if function(item):
            group.append(item)
        else:
            yield group
            group = [item] # Why?
    if group:
        yield group

Non-Stream Approach without generators

A non-stream non-generator approach can be implemented using the itertools module.

Chunking

import itertools

def group_iterables_in_pairs(iterable):
    return list(itertools.pairwise(iterable))

sequence = list(range(7))
target = group_iterables_in_pairs(sequence)

print(target)

# Grouping without padding
def grouper_without_padding(n, iterable):
    it = iter(iterable)
    return iter(lambda: list(itertools.islice(it, n)), [])

target = list(grouper_without_padding(3, sequence))

print(target)

# Grouping with padding
def grouper_with_padding(n, iterable):
    it = iter(iterable)
    return itertools.zip_longest(*[it]*n, fillvalue='x')
                                 
target = list(grouper_with_padding(3, sequence))
# target = list(itertools.zip_longest(*[iter(sequence)]*3, fillvalue='x'))

print(target)

[(0, 1), (1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
[[0, 1, 2], [3, 4, 5], [6]]
[(0, 1, 2), (3, 4, 5), (6, 'x', 'x')]

# Under the hood
sequence = list(range(7))
a = [iter(sequence)]*3
print(a)

[<list_iterator object at 0x000001558669A7D0>, <list_iterator object at 0x000001558669A7D0>, <list_iterator object at 0x000001558669A7D0>]

Explanation: Diving to `grouper_with_padding()`

[iter(sequence)]*3 gives us the same iterator 3 times!
This iterator goes over the entire sequence.
*[iter(sequence)]*3 flattens the iterators in order.
Now, Zipping occurs.

The zipping process: Notice that:

Since we have 3 iterables going into the method, this is a generator of 3-tuples.
Since we have 3 iterables which are all the same iterator.

This creates a very interesting behavior:

When we take the first element of the iterator, that element is gone.
The first element in this iterator is now the second element.
But all our iterables arguments are the same iterator!
Hence, the first element in the second iterator is the originally second element in the list.
We thus add the second element to the current tuple, where the first element is first.
According to the same logic, the last element in the tuple is now the third element!

This goes on until the generator is exhausted.

If we still need elements for the last tuple, we use a fill value, fillvalue.

Group by break function: Existing element

from itertools import groupby

def break_function(x):
    return x < 7

def group_by_break_function(sequence, function):
    return [list(g) for k, g in groupby(sequence, function) if k]


sequence = [3,2,5,7,4,5,6,3,8,4,5,7,8,9,5,7,8,4,9,7,6]

# [list(g) for k, g in groupby(a, lambda x:x<7) if k]
target = group_by_break_function(sequence, break_function)

print(target)

[[3, 2, 5], [4, 5, 6, 3], [4, 5], [5], [4], [6]]

# Under the hood
a = [3,2,5,7,4,5,6,3,8,4,5,7,8,9,5,7,8,4,9,7,6]

result = [(k,list(g)) for k, g in groupby(a, lambda x:x<7)]

print(result)

[(True, [3, 2, 5]), (False, [7]), (True, [4, 5, 6, 3]), (False, [8]), (True, [4, 5]), (False, [7, 8, 9]), (True, [5]), (False, [7, 8]), (True, [4]), (False, [9, 7]), (True, [6])]

Explanation: Diving to `groupby()`

This is a great example for:

When sorting should not always happen on a sequence before using groupby().
Where SQL's GROUPBY differs from Python's groupby()

SQL's GROUPBY does not care about the order of keys. As the Python docs states, “it aggregates common elements regardless of their input order”.

However, notice what happens here - The keys are from Python's groupby() are allowed to not be unique!

This allows to to group elements in a sequence according to any function we want, as long as we put a “barrier” between chunks, for which the grouping function returns False. Beautiful!

Other groupby use cases

See: https://stackoverflow.com/a/45873519/8552542

Group by break function: Indices

from itertools import groupby

a = [3,2,5,7,4,5,6,3,8,4,5,7,8,9,5,7,8,4,9,7,6]
indices_list = [5,7,9]
indices_set = set(indices_list)

result = [[x for _,x in list(g)] for _, g in groupby(enumerate(a), lambda x :x[0] in indices_set)]

print(result)

[[3, 2, 5, 7, 4], [5], [6], [3], [8], [4], [5, 7, 8, 9, 5, 7, 8, 4, 9, 7, 6]]