BitterBench

Comparative inference runs across local, cloud, and routed execution planes.

Comparative Workbench

BitterBench is becoming a real comparative inference surface.

This workbench now supports durable comparative ingest, multi-plane source summaries, filterable benchmark packets, and direct run inspection for the selected comparison set.

Current State

Comparison drilldown live

BitterMill exports and external-provider packets can now land in one contract and be inspected as comparison sets rather than flat logs.

Sources

0

Runs

0

Completed

0

Latest

n/a

Filters

Narrow the benchmark corpus by plane, provider, source, workload, or model.

0 active

Comparison Sets

Select a benchmark packet to inspect its sources and underlying runs.

No comparison sets yet. Ingest BitterMill exports or demo packets to start building a comparative corpus.

Select a comparison set to inspect it.

Runs In Selection

Inspect the underlying benchmark records for the active comparison.

0 of 0 loaded

No runs available for the current selection.

Select a run to inspect its timings and metadata.

Sources

BitterMill cells, routers, and cloud providers should all land in the same source model.

No sources ingested yet. POST a comparative packet or load the demo samples to populate the workbench.

Latest Runs

Recent benchmark records across the currently filtered source set.

No run records ingested yet. Start with `api/bin/load_demo_benchmark_data` or post a comparative payload into the ingest endpoint.