KTransformers

Launch a Server

KTransformers serving has two public launch paths:

PathUse when
kt run <model>The model is in the built-in KT registry and you want model-specific defaults.
python -m sglang.launch_server ... --kt-*You need explicit paths, custom placement, tensor parallelism, or model-specific experiments.

Registry Launch

Install the inference packages:

pip install kt-kernel sglang-kt

List or search registered models:

kt model list
kt model search minimax

Start a registered model:

kt run m2.1

Use a dry run before consuming GPU and CPU memory:

kt run m2.1 --dry-run

Registry entries carry model-specific defaults such as --kt-method, attention backend, parser options, token limits, and placement defaults. Check Support Matrix before assuming the default applies to a different checkpoint.

Manual SGLang-KT Launch

Use manual launch when you need full control:

python -m sglang.launch_server \
  --host 0.0.0.0 \
  --port 30000 \
  --model-path /path/to/model \
  --served-model-name my-model \
  --trust-remote-code \
  --tensor-parallel-size 1 \
  --kt-weight-path /path/to/kt-weights \
  --kt-method FP8 \
  --kt-cpuinfer 64 \
  --kt-threadpool-count 2 \
  --kt-num-gpu-experts 32 \
  --disable-shared-experts-fusion

The --kt-method and --kt-weight-path pair must match the exact weight format. Do not copy a launch command across model families without checking the model page and the support matrix.

After Launch

Continue with: