Solution

Batch inference built for offline and large-volume workloads

When the workload cares more about cost, throughput, and callback automation than single-request latency, batch lanes are usually the better operating model.

Separate offline jobs from interactive traffic so cost controls and latency targets stop fighting each other.
Use high, low, and fill lanes to match each batch workload to the right cost structure.
Keep usage, billing, callbacks, and exports aligned so large jobs remain operationally reviewable.

Open batch Read Billing