This guide will help you get up and running with Inference Engine Arena quickly. You’ll learn how to install the framework, start an inference engine, and run a simple benchmark.

Installation

It’s recommended to use uv, a very fast Python environment manager, install it from here or curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/Inference-Engine-Arena/inference-engine-arena.git
cd inference-engine-arena

# Install dependencies
uv venv myenv
source myenv/bin/activate
uv pip install -e .

If you encounter any issues, please refer to the Troubleshooting guide or submit an issue on GitHub.

After installing the framework, you can start the inference engines and run benchmarks with two main approaches:

  1. Manual mode: Starting engines and running benchmarks separately, designed for benchmarking simple test cases and experimental runs.
  2. Batch mode: Benchmark process initiated by a single command or a single YAML file, suitable for complex and large-scale test case benchmarks, or sharing reproducible experiments with others.

Manual Mode

In manual mode, you start engines first, then run benchmarks. We’ll show how to benchmark NousResearch/Meta-Llama-3.1-8B using vLLM.

Batch Mode

For more complex scenarios, where you need to benchmark multiple engines with different configurations and different benchmark configurations,you can define your experiments in a YAML file and run them with a single command.

Here, we use /example_yaml/Meta-Llama-3.1-8B-varied-max-num-seq.yaml as an example, which benchmarks the same benchmark type with different max-num-seqs configurations.

Tips: You may also refer to other examples in the /example_yaml directory. And runyaml section for more details.

Viewing Results

After running benchmarks, there are two ways to view and analyze the results.

Dashboard

Here you can visualize and compare results from a single benchmark run, and share them with our community.

arena dashboard

For a detailed introduction to the dashboard, see Dashboard

Leaderboard

Here you can compare your results across different benchmark runs, and can visit the community for further results.

arena leaderboard

For a detailed introduction to the leaderboard, see Leaderboard

Upload Results

To share your benchmark results with the community, use these commands:

# Upload signle results to the global leaderboard
arena upload sub-run-20250420-211509-vllm-Meta-Llama-3-1-8B-conversational-short-582ae937.json
# Upload all results to the global leaderboard
arena upload
# Anonymous data upload
arena upload --no-login

If you don’t use the --no-login flag, you’ll need to log in to authorize the upload. We recommend starting with a single JSON file upload to complete the login process, then using the command to upload all your data. Alternatively, you can first share your results in the dashboard using the “Share Subrun to Global Leaderboard” button. Don’t worry about duplicate submissions - our system automatically deduplicates any repeated data.