How to start and configure inference engines
<engine_type>
is the type of engine (e.g., vllm
, sglang
)<model_name_or_path>
is either a Hugging Face model ID or [engine_args]
are arguments passed directly to the underlying engine, which is compatible to anything after vllm serve
Basic Usage
Advanced Options
Basic Usage
Advanced Options
Listing Engines
Viewing Logs
Stopping Engines