KTransformers

OpenAI-Compatible API

KTransformers inference uses SGLang-KT for serving, so application code should treat the server as an OpenAI-compatible endpoint when possible.

Primary Endpoint

Use:

POST /v1/chat/completions

Typical request fields:

FieldNotes
modelMust match the served model name configured at launch.
messagesChat messages with role and content.
temperature, top_p, max_tokensSampling controls passed to the serving runtime.
streamUse true for Server-Sent Events streaming.

Model Name

Set a stable served model name when launching manually:

python -m sglang.launch_server \
  --model-path /path/to/model \
  --served-model-name my-model \
  --kt-weight-path /path/to/kt-weights \
  --kt-method FP8

Then use my-model in requests.

Compatibility Notes

  • Prefer the OpenAI client path for application integration.
  • For model-specific tool calling or reasoning parsers, use the parser options from the exact model page or registry default.
  • If output formatting differs from a non-KT SGLang deployment, first check the tokenizer chat template, served model name, parser options, and model-specific launch arguments.