Launch a Server

KTransformers serving has two public launch paths:

Path	Use when
`kt run <model>`	The model is in the built-in KT registry and you want model-specific defaults.
`python -m sglang.launch_server ... --kt-*`	You need explicit paths, custom placement, tensor parallelism, or model-specific experiments.

Registry Launch

Install the inference packages:

pip install kt-kernel sglang-kt

List or search registered models:

kt model list
kt model search minimax

Start a registered model:

kt run m2.1

Use a dry run before consuming GPU and CPU memory:

kt run m2.1 --dry-run

Registry entries carry model-specific defaults such as --kt-method, attention backend, parser options, token limits, and placement defaults. Check Support Matrix before assuming the default applies to a different checkpoint.

Manual SGLang-KT Launch

Use manual launch when you need full control:

python -m sglang.launch_server \
  --host 0.0.0.0 \
  --port 30000 \
  --model-path /path/to/model \
  --served-model-name my-model \
  --trust-remote-code \
  --tensor-parallel-size 1 \
  --kt-weight-path /path/to/kt-weights \
  --kt-method FP8 \
  --kt-cpuinfer 64 \
  --kt-threadpool-count 2 \
  --kt-num-gpu-experts 32 \
  --disable-shared-experts-fusion

The --kt-method and --kt-weight-path pair need to match the exact weight format. Before copying a launch command across model families, check the model page and the support matrix.

After Launch

Continue with: