roastcoffea.aggregation.chunk

Chunk-level metrics aggregation.

Functions

aggregate_chunk_metrics(chunk_metrics[, ...])

Aggregate chunk-level metrics.

build_chunk_info(chunk_metrics)

Build chunk_info dict from chunk metrics for throughput plotting.

roastcoffea.aggregation.chunk.aggregate_chunk_metrics(chunk_metrics, section_metrics=None)[source]

Aggregate chunk-level metrics.

Parameters:
  • chunk_metrics (list of dict, optional) – List of per-chunk metrics from @track_metrics decorator

  • section_metrics (list of dict, optional) – List of section metrics from track_section() and track_memory()

Returns:

Aggregated chunk metrics including: - Number of chunks processed - Timing statistics (mean, min, max, std) - Memory statistics - Per-dataset breakdown - Section timing breakdown

Return type:

dict

roastcoffea.aggregation.chunk.build_chunk_info(chunk_metrics)[source]

Build chunk_info dict from chunk metrics for throughput plotting.

Transforms chunk-level metrics collected by @track_metrics into the format expected by plot_throughput_timeline().

Parameters:

chunk_metrics (list of dict) – List of chunk metrics dicts from @track_metrics decorator. Each dict contains: file, entry_start, entry_stop, t_start, t_end, bytes_read

Returns:

Dictionary mapping chunk keys to timing/bytes data: {(filename, entry_start, entry_stop): (t_start, t_end, bytes_read)}

Return type:

dict

Examples

>>> chunk_metrics = [
...     {"file": "data.root", "entry_start": 0, "entry_stop": 1000,
...      "t_start": 1.0, "t_end": 2.5, "bytes_read": 50000},
... ]
>>> chunk_info = build_chunk_info(chunk_metrics)
>>> chunk_info
{('data.root', 0, 1000): (1.0, 2.5, 50000)}

Notes

  • Chunks without file/entry metadata are skipped.

  • Chunks without bytes_read default to 0 bytes.