KTransformers

First Inference Server

Install the inference packages first:

pip install kt-kernel sglang-kt

Option 1: Registered Models

Use kt run when your model is in the built-in registry:

kt run m2.1

Other registered examples include DeepSeek V3 / R1 / V3.2, DeepSeek V4-Flash, Kimi K2 Thinking, and MiniMax M2 / M2.1. Registry defaults include model-specific --kt-method, attention backend, and serving arguments.

Option 2: Manual SGLang-KT Launch

Use the SGLang launch path when you need full control over model paths and KT arguments:

python -m sglang.launch_server \
  --model-path /path/to/model \
  --served-model-name my-model \
  --tensor-parallel-size 1 \
  --kt-weight-path /path/to/kt-weights \
  --kt-method FP8 \
  --kt-num-gpu-experts 1 \
  --disable-shared-experts-fusion

Choose --kt-method from the exact model page or Support Matrix. Do not copy a method from another model family without checking the weight format and hardware backend.

Next Steps