This guide covers how to start different inference engines with various configurations using Inference Engine Arena.

Basic Usage

The basic syntax for starting an engine is:

# Start an engine
arena start <engine_type> <model_name_or_path> [engine_args]

Where:

  • <engine_type> is the type of engine (e.g., vllm, sglang)
  • <model_name_or_path> is either a Hugging Face model ID or a local path(currently not supported) to a model
  • [engine_args] are arguments passed directly to the underlying engine, which is compatible to anything after vllm serve

Environment Variables(optional)

Before starting an engine, you can set environment variables to configure advanced behaviors. These are set using standard shell commands:

# Example setting environment variables before starting an engine
export VLLM_USE_V1=1
export HUGGING_FACE_HUB_TOKEN="YOUR_HUGGING_FACE_TOKEN"
export CUDA_VISIBLE_DEVICES=1

Starting vLLM

Starting SGLang

Managing Running Engines