Start/List/Stop Engines

This guide covers how to start different inference engines with various configurations using Inference Engine Arena.

Basic Usage

The basic syntax for starting an engine is:

# Start an engine
arena start <engine_type> <model_name_or_path> [engine_args]

Where:

<engine_type> is the type of engine (e.g., vllm, sglang)
<model_name_or_path> is either a Hugging Face model ID or ~~a local path(currently not supported)~~ to a model
[engine_args] are arguments passed directly to the underlying engine, which is compatible to anything after vllm serve

Environment Variables(optional)

Before starting an engine, you can set environment variables to configure advanced behaviors. These are set using standard shell commands:

# Example setting environment variables before starting an engine
export VLLM_USE_V1=1
export HUGGING_FACE_HUB_TOKEN="YOUR_HUGGING_FACE_TOKEN"
export CUDA_VISIBLE_DEVICES=1

Starting vLLM

Basic Usage

Advanced Options

# Start vLLM with environment variables and engine arguments
export VLLM_USE_V1=1
export HUGGING_FACE_HUB_TOKEN="YOUR_HUGGING_FACE_TOKEN"
export CUDA_VISIBLE_DEVICES=1
arena start vllm NousResearch/Meta-Llama-3.1-8B --enable-prefix-caching --quantization fp8

Starting SGLang

Basic Usage

Advanced Options

# Start sglang with environment variables and engine arguments
export SGL_ENABLE_JIT_DEEPGEMM=1
export HUGGING_FACE_HUB_TOKEN="YOUR_HUGGING_FACE_TOKEN"
export CUDA_VISIBLE_DEVICES=1
arena start sglang NousResearch/Meta-Llama-3.1-8B --enable-torch-compile

Managing Running Engines

Listing Engines

Viewing Logs

Stopping Engines

Getting Started

Usage

Roadmap

Basic Usage

Environment Variables(optional)

Starting vLLM

Starting SGLang

Managing Running Engines

Getting Started

Usage

Roadmap

​Basic Usage

​Environment Variables(optional)

​Starting vLLM

​Starting SGLang

​Managing Running Engines

Basic Usage

Environment Variables(optional)

Starting vLLM

Starting SGLang

Managing Running Engines