Benchmark and Profiling

Performance claims must be reproducible. Do not report a single tokens/s number without the runtime tuple and workload shape.

Required Metadata

Report metrics separately:

Metric	Meaning
Prefill tokens/s	Prompt processing throughput.
Decode tokens/s	Generation throughput after prefill.
End-to-end latency	User-visible latency for a request shape.
Peak memory	CPU RAM and GPU VRAM under the tested load.

When comparing against another runtime, align:

If any field differs, label the result as a directional observation rather than a strict benchmark.