OpenAI-Compatible API
KTransformers inference uses SGLang-KT for serving, so application code should treat the server as an OpenAI-compatible endpoint when possible.
Primary Endpoint
Use:
POST /v1/chat/completions
Typical request fields:
| Field | Notes |
|---|---|
model | Must match the served model name configured at launch. |
messages | Chat messages with role and content. |
temperature, top_p, max_tokens | Sampling controls passed to the serving runtime. |
stream | Use true for Server-Sent Events streaming. |
Model Name
Set a stable served model name when launching manually:
python -m sglang.launch_server \
--model-path /path/to/model \
--served-model-name my-model \
--kt-weight-path /path/to/kt-weights \
--kt-method FP8
Then use my-model in requests.
Compatibility Notes
- Prefer the OpenAI client path for application integration.
- For model-specific tool calling or reasoning parsers, use the parser options from the exact model page or registry default.
- If output formatting differs from a non-KT SGLang deployment, first check the tokenizer chat template, served model name, parser options, and model-specific launch arguments.