How to run and configure benchmarks on inference engines
<engine_list>
is a space-separated list of engine types (e.g., vllm sglang
)<benchmark_list>
is a space-separated list of benchmark types (e.g., conversational_short summarization
)/example_yaml
directory for more examples.
Benchmark Name | Workload Type | Input Length | Output Length | Prefix Length | QPS | Use Case |
---|---|---|---|---|---|---|
summarization | LISO | 12000 | 100 | 0 | 2 | Long document summarization, Meeting notes |
rewrite_essay | LILO | 12000 | 3000 | 0 | 2 | Essay rewriting and editing |
write_essay | SILO | 100 | 3000 | 0 | 2 | Essay generation from short prompts |
conversational_short | SISO | 100 | 100 | 0 | 10 | Short chat interactions |
conversational_medium | MISO | 1000 | 100 | 2000 | 5 | Medium-length chat with context |
conversational_long | LISO | 5000 | 100 | 7000 | 2 | Long conversations with extensive history |
src/benchmarks/benchmark_configs
folder