The Inference Engine Arena Leaderboard provides a powerful way to compare benchmark results across different inference engines, models, and hardware configurations. This guide will walk you through using the leaderboard effectively.

Strating the Local Leaderboard

Local Leaderboard

Start the local leaderboard server from the command line:

# Start the leaderboard server
arena leaderboard

Expected output:

* Running on local URL:  http://0.0.0.0:3004
2025-04-21 13:53:59,354 - httpx - INFO - HTTP Request: GET http://localhost:3004/gradio_api/startup-events "HTTP/1.1 200 OK"
2025-04-21 13:53:59,376 - httpx - INFO - HTTP Request: HEAD http://localhost:3004/ "HTTP/1.1 200 OK"
2025-04-21 13:53:59,516 - httpx - INFO - HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
2025-04-21 13:54:00,390 - httpx - INFO - HTTP Request: GET https://api.gradio.app/v3/tunnel-request "HTTP/1.1 200 OK"
* Running on public URL: https://3346cd9969943b0342.gradio.live

Once the server is running, open your browser and navigate to the local URL or using public URL. You can share the public URL to allow others to access it remotely.

Filtering and Searching

The leaderboard provides powerful filtering capabilities to help you focus on relevant benchmark results.

Filter Panel

Filtered Results

The main leaderboard view displays results in a comprehensive table format. You can also click the button of Show details to see some detailed information of the selected sub-run.

Table View

Scatter Plot Visualization

The scatter plot view provides a powerful way to visualize relationships between different arguments, and it is synchronized in real time with the filter results above.

Scatter Plot

Global Leaderboard

The global leaderboard connects you with the broader inference engine community. Visit https://iearena.org/ to view and share your benchmark results with others.

Community Results

Metrics

Inference Engine Arena collects a comprehensive set of metrics to help you evaluate and compare the performance of different inference engines. This guide explains the key metrics and how to interpret them.

Key Performance Metrics

Throughput

How fast can the engine process tokens?

Latency

How responsive is the engine to requests?

Memory Usage

How efficiently does the engine use GPU memory?

Concurrency

How well does the engine handle multiple simultaneous requests?

Throughput Metrics

Input Throughput

Output Throughput

Total Throughput

Request Throughput

Latency Metrics

Time to First Token (TTFT)

Time Per Output Token (TPOT)

End-to-End Latency

Memory Metrics

Peak GPU Memory Usage

Memory Efficiency

Concurrency Metrics

Scaling Efficiency

Max Effective Concurrency

Interpreting Benchmark Results

When analyzing benchmark results, consider:

Workload Characteristics: Different engines excel at different types of workloads. Match the metrics that matter most to your specific use case.
Hardware Utilization: Check how efficiently each engine utilizes your hardware. Some engines may perform better on specific GPU architectures.
Trade-offs: There’s often a trade-off between throughput and latency. Decide which is more important for your application.
Scaling: Look at how performance scales with concurrency to understand how the engine will behave under load.

Visualizing Metrics

Inference Engine Arena provides various ways to visualize metrics:

Comparative Bar Charts: Compare key metrics across different engines
Time Series Graphs: See how metrics evolve during a benchmark run
Scaling Curves: Understand how performance scales with concurrency

Getting Started

Usage

Roadmap

Leaderboard

Strating the Local Leaderboard

Filtering and Searching

Filtered Results

Scatter Plot Visualization

Global Leaderboard

Metrics

Key Performance Metrics

Throughput

Latency

Memory Usage

Concurrency

Throughput Metrics

Latency Metrics

Memory Metrics

Concurrency Metrics

Interpreting Benchmark Results

Visualizing Metrics

Getting Started

Usage

Roadmap

​Strating the Local Leaderboard

​Filtering and Searching

​Filtered Results

​Scatter Plot Visualization

​Global Leaderboard

​Metrics

​Key Performance Metrics

Throughput

Latency

Memory Usage

Concurrency

​Throughput Metrics

​Latency Metrics

​Memory Metrics

​Concurrency Metrics

​Interpreting Benchmark Results

​Visualizing Metrics

Strating the Local Leaderboard

Filtering and Searching

Filtered Results

Scatter Plot Visualization

Global Leaderboard

Metrics

Key Performance Metrics

Throughput Metrics

Latency Metrics

Memory Metrics

Concurrency Metrics

Interpreting Benchmark Results

Visualizing Metrics