First Inference Server
Install the inference packages first:
pip install kt-kernel sglang-kt
Option 1: Registered Models
Use kt run when your model is in the built-in registry:
kt run m2.1
Other registered examples include DeepSeek V3 / R1 / V3.2, DeepSeek V4-Flash, Kimi K2 Thinking, and MiniMax M2 / M2.1. Registry defaults include model-specific --kt-method, attention backend, and serving arguments.
Option 2: Manual SGLang-KT Launch
Use the SGLang launch path when you need full control over model paths and KT arguments:
python -m sglang.launch_server \
--model-path /path/to/model \
--served-model-name my-model \
--tensor-parallel-size 1 \
--kt-weight-path /path/to/kt-weights \
--kt-method FP8 \
--kt-num-gpu-experts 1 \
--disable-shared-experts-fusion
Choose --kt-method from the exact model page or Support Matrix. Do not copy a method from another model family without checking the weight format and hardware backend.