🪨 Chunk Iterable
Published 2022-05-26Today I'll be discussing the Chunk Iterable Framework
. This is a core component of Aphrodite and is used to speed up the processing of data returned by queries.
The main idea behind a chunk iterable is to be able to iterate over some source of data in chunks. Why would we want to iterate over a data source in chunks?
Mainly for performance. Imagine you have an unbounded stream of data. There are two options for processing this data that exist on a spectrum.
- Option 1: Process everything in the stream all at once.
- Option 2: Process a single item from the stream at a time.
If the source is unbounded (or larger than your working memory), Option 1 is not possible.
Option 2 is always possible but can be slow. Option 2 is like making 10,000 individual trips to the store to pick up 10,000 packs of m&ms. You probably can't fit 10,000 packs of m&ms in a single tripe but you could get a chunk of 500 packs in a single trip, reducing your total trips from 10,000 to 20.
ChunkIterable
is the same. If a data source could return a massive amount of data, ChunkIterable
streams the results back in chunks. This strikes a nice balance between batch processing and not overwhelming your local resources.
ChunkIterable
, conforming to an Iterable
interface, also allows you to perform operations like filter
& map
against chunks.
This is important for Aphrodite
as Aphrodite
uses map
to turn a raw data stream into models and uses filter
to apply filters to streams that couldn't be hoisted to the database layer.