roastcoffea.visualization.plots.throughput

Throughput and data rate plots.

Visualizations for data processing rates and event throughput.

Functions

plot_throughput_timeline(chunk_info[, ...])

Plot instantaneous data throughput (Gbps) over time.

plot_total_active_tasks_timeline(tracking_data)

Plot total active tasks across all workers over time.

plot_worker_activity_timeline(tracking_data)

Plot active tasks per worker over time.

roastcoffea.visualization.plots.throughput.plot_worker_activity_timeline(tracking_data, output_path=None, figsize=(12, 6), title='Worker Activity Over Time', max_legend_entries=5)[source]

Plot active tasks per worker over time.

Shows the number of active (processing + queued) tasks per worker, which indicates overall workload distribution.

Parameters:
  • tracking_data (dict or None) – Tracking data with worker_active_tasks

  • output_path (Path, optional) – Save path

  • figsize (tuple) – Figure size

  • title (str) – Plot title

  • max_legend_entries (int, optional) – Maximum number of workers to show in legend. Default is 5.

Returns:

fig, ax – Matplotlib figure and axes

Return type:

Figure and Axes

Raises:

ValueError – If tracking_data is None or missing active tasks data

roastcoffea.visualization.plots.throughput.plot_total_active_tasks_timeline(tracking_data, output_path=None, figsize=(10, 5), title='Total Active Tasks Over Time')[source]

Plot total active tasks across all workers over time.

Aggregates active tasks from all workers to show overall cluster activity.

Parameters:
  • tracking_data (dict or None) – Tracking data with worker_active_tasks

  • output_path (Path, optional) – Save path

  • figsize (tuple) – Figure size

  • title (str) – Plot title

Returns:

fig, ax – Matplotlib figure and axes

Return type:

Figure and Axes

Raises:

ValueError – If tracking_data is None or missing active tasks data

roastcoffea.visualization.plots.throughput.plot_throughput_timeline(chunk_info, tracking_data=None, output_path=None, figsize=(12, 6), title='Data Throughput Over Time')[source]

Plot instantaneous data throughput (Gbps) over time.

Computes the instantaneous data rate at each sample point by finding all chunks that were being processed at that moment and summing their individual throughputs.

Optionally overlays worker count on a secondary y-axis if tracking_data is provided.

Parameters:
  • chunk_info (dict) – Per-chunk timing data from metrics. Format: {(filename, start, stop): (t0, t1, bytesread)}

  • tracking_data (dict, optional) – Worker tracking data with worker_counts for overlay plot

  • output_path (Path, optional) – Path to save figure

  • figsize (tuple) – Figure size (width, height)

  • title (str) – Plot title

Returns:

fig, ax – Matplotlib figure and axes (returns primary axes)

Return type:

Figure and Axes

Raises:

ValueError – If chunk_info is empty