OpenAI-Compatible API

KTransformers inference uses SGLang-KT for serving, so application code should treat the server as an OpenAI-compatible endpoint when possible.

Primary Endpoint

Use:

POST /v1/chat/completions

Typical request fields:

Field	Notes
`model`	Must match the served model name configured at launch.
`messages`	Chat messages with `role` and `content`.
`temperature`, `top_p`, `max_tokens`	Sampling controls passed to the serving runtime.
`stream`	Use `true` for Server-Sent Events streaming.

Model Name

Set a stable served model name when launching manually:

python -m sglang.launch_server \
  --model-path /path/to/model \
  --served-model-name my-model \
  --kt-weight-path /path/to/kt-weights \
  --kt-method FP8

Then use my-model in requests.

Compatibility Notes

Prefer the OpenAI client path for application integration.
For model-specific tool calling or reasoning parsers, use the parser options from the exact model page or registry default.
If output formatting differs from a non-KT SGLang deployment, first check the tokenizer chat template, served model name, parser options, and model-specific launch arguments.

OpenAI-Compatible API

Primary Endpoint

Model Name

Compatibility Notes

Related Pages