roastcoffea.aggregation.chunk¶
Chunk-level metrics aggregation.
Functions
|
Aggregate chunk-level metrics. |
|
Build chunk_info dict from chunk metrics for throughput plotting. |
- roastcoffea.aggregation.chunk.aggregate_chunk_metrics(chunk_metrics, section_metrics=None)[source]¶
Aggregate chunk-level metrics.
- Parameters:
- Returns:
Aggregated chunk metrics including: - Number of chunks processed - Timing statistics (mean, min, max, std) - Memory statistics - Per-dataset breakdown - Section timing breakdown
- Return type:
- roastcoffea.aggregation.chunk.build_chunk_info(chunk_metrics)[source]¶
Build chunk_info dict from chunk metrics for throughput plotting.
Transforms chunk-level metrics collected by @track_metrics into the format expected by plot_throughput_timeline().
- Parameters:
chunk_metrics (list of dict) – List of chunk metrics dicts from @track_metrics decorator. Each dict contains: file, entry_start, entry_stop, t_start, t_end, bytes_read
- Returns:
Dictionary mapping chunk keys to timing/bytes data: {(filename, entry_start, entry_stop): (t_start, t_end, bytes_read)}
- Return type:
Examples
>>> chunk_metrics = [ ... {"file": "data.root", "entry_start": 0, "entry_stop": 1000, ... "t_start": 1.0, "t_end": 2.5, "bytes_read": 50000}, ... ] >>> chunk_info = build_chunk_info(chunk_metrics) >>> chunk_info {('data.root', 0, 1000): (1.0, 2.5, 50000)}
Notes
Chunks without file/entry metadata are skipped.
Chunks without bytes_read default to 0 bytes.