Launch a Server
KTransformers serving has two public launch paths:
| Path | Use when |
|---|---|
kt run <model> | The model is in the built-in KT registry and you want model-specific defaults. |
python -m sglang.launch_server ... --kt-* | You need explicit paths, custom placement, tensor parallelism, or model-specific experiments. |
Registry Launch
Install the inference packages:
pip install kt-kernel sglang-kt
List or search registered models:
kt model list
kt model search minimax
Start a registered model:
kt run m2.1
Use a dry run before consuming GPU and CPU memory:
kt run m2.1 --dry-run
Registry entries carry model-specific defaults such as --kt-method, attention backend, parser options, token limits, and placement defaults. Check Support Matrix before assuming the default applies to a different checkpoint.
Manual SGLang-KT Launch
Use manual launch when you need full control:
python -m sglang.launch_server \
--host 0.0.0.0 \
--port 30000 \
--model-path /path/to/model \
--served-model-name my-model \
--trust-remote-code \
--tensor-parallel-size 1 \
--kt-weight-path /path/to/kt-weights \
--kt-method FP8 \
--kt-cpuinfer 64 \
--kt-threadpool-count 2 \
--kt-num-gpu-experts 32 \
--disable-shared-experts-fusion
The --kt-method and --kt-weight-path pair must match the exact weight format. Do not copy a launch command across model families without checking the model page and the support matrix.
After Launch
Continue with: