Installation
It’s recommended to use uv, a very fast Python environment manager, install it from here orcurl -LsSf https://astral.sh/uv/install.sh | sh
- Manual mode: Starting engines and running benchmarks separately, designed for benchmarking simple test cases and experimental runs.
- Batch mode: Benchmark process initiated by
a single command ora single YAML file, suitable for complex and large-scale test case benchmarks, or sharing reproducible experiments with others.
Manual Mode
In manual mode, you start engines first, then run benchmarks. We’ll show how to benchmarkNousResearch/Meta-Llama-3.1-8B using vLLM.
Starting an Engine
Starting an Engine
Start a vLLM engine with You’ll see logs as the engine starts up. Wait until you see:The process will running behind the scene. And you can check the status of running engines:You can see the engine currently running.
NousResearch/Meta-Llama-3.1-8B. You can add any vLLM parameters that are compatible with vllm serve after vllm.Optional: You can also set environment variables as you need.Running Benchmarks on the Engine
Running Benchmarks on the Engine
Once your engine is running, you can run a simple benchmark:Tips: You may run multiple benchmarks on the same engine. Or multiple benchmarks on different engines at the same time. Refer to Run Benchmarks for more details.Expected output will show metrics like throughput, ttft, etc. The results will be saved to
./results by default. Refer to the Dashboard and Leaderboard for more details.Stopping an Engine
Stopping an Engine
If you don’t need this engine anymore, or wish to adjust its parameters:The output below confirms the engine has been stopped successfully:
Batch Mode
For more complex scenarios, where you need to benchmark multiple engines with different configurations and different benchmark configurations,you can define your experiments in a YAML file and run them with a single command. Here, we use/example_yaml/Meta-Llama-3.1-8B-varied-max-num-seq.yaml as an example, which benchmarks the same benchmark type with different max-num-seqs configurations.
Tips: You may also refer to other examples in the /example_yaml directory. And runyaml section for more details.
Example YAML Configuration
Example YAML Configuration
Take a look at the YAML file:If you want to test more engines or benchmark types, you can continue adding engine with different configurations and benchmark configurations afterward.Run the benchmark configuration:This will automatically run the all experiments defined in the YAML file.
Viewing Results
After running benchmarks, there are two ways to view and analyze the results.Dashboard
Here you can visualize and compare results from a single benchmark run, and share them with our community.For a detailed introduction to the dashboard, see
DashboardLeaderboard
Here you can compare your results across different benchmark runs, and can visit the community for further results.For a detailed introduction to the leaderboard, see
LeaderboardUpload Results
To share your benchmark results with the community, use these commands:--no-login flag, you’ll need to log in to authorize the upload. We recommend starting with a single JSON file upload to complete the login process, then using the command to upload all your data. Alternatively, you can first share your results in the dashboard using the “Share Subrun to Global Leaderboard” button. Don’t worry about duplicate submissions - our system automatically deduplicates any repeated data.
