The Inference Engine Arena Leaderboard provides a powerful way to compare benchmark results across different inference engines, models, and hardware configurations. This guide will walk you through using the leaderboard effectively.

Strating the Local Leaderboard

Filtering and Searching

The leaderboard provides powerful filtering capabilities to help you focus on relevant benchmark results.

Filtered Results

The main leaderboard view displays results in a comprehensive table format. You can also click the button of Show details to see some detailed information of the selected sub-run.

Scatter Plot Visualization

The scatter plot view provides a powerful way to visualize relationships between different arguments, and it is synchronized in real time with the filter results above.

Global Leaderboard

The global leaderboard connects you with the broader inference engine community. Visit https://iearena.org/ to view and share your benchmark results with others.

Metrics

Inference Engine Arena collects a comprehensive set of metrics to help you evaluate and compare the performance of different inference engines. This guide explains the key metrics and how to interpret them.

Key Performance Metrics

Throughput Metrics

Latency Metrics

Memory Metrics

Concurrency Metrics

Interpreting Benchmark Results

When analyzing benchmark results, consider:

  1. Workload Characteristics: Different engines excel at different types of workloads. Match the metrics that matter most to your specific use case.

  2. Hardware Utilization: Check how efficiently each engine utilizes your hardware. Some engines may perform better on specific GPU architectures.

  3. Trade-offs: There’s often a trade-off between throughput and latency. Decide which is more important for your application.

  4. Scaling: Look at how performance scales with concurrency to understand how the engine will behave under load.

Visualizing Metrics

Inference Engine Arena provides various ways to visualize metrics:

  • Comparative Bar Charts: Compare key metrics across different engines
  • Time Series Graphs: See how metrics evolve during a benchmark run
  • Scaling Curves: Understand how performance scales with concurrency