Support Matrix
Checked on: 2026-05-10 Asia/Shanghai.
Use this matrix before treating a tutorial as current support. A model is supported only when the model family, checkpoint, KT method/backend, hardware class, and serving or training entry all match.
For the documentation policy behind these labels, see Model Status Policy. For validation steps, see Runtime Smoke Checklist.
Status
| Status | Meaning |
|---|---|
| Current | Code entry exists and the documented interface matches the current repo. |
| Current, narrow | Supported under explicit model, hardware, package, or backend constraints. |
| Needs smoke | Code or docs exist, but the exact runtime tuple should be rerun before production claims. |
| Needs reconciliation | Multiple current-looking paths exist and must be unified before the website makes one recommendation. |
| Not current support | Do not advertise as a current KTransformers capability. |
| Supported direction | The project direction is valid, but model-specific docs need hardware validation. |
| Legacy | Depends on old local_chat.py, ktransformers/server/main.py, balance_serve, or kt_optimize_rule paths. |
Inference Methods
| Method | Status | Notes |
|---|---|---|
BF16 | Current | Native BF16 MoE expert inference for documented model paths. |
FP8 | Current | Native FP8 path used by DeepSeek, MiniMax, Qwen, and GLM-style pages. |
FP8_PERCHANNEL | Current, narrow | Per-channel FP8; do not generalize to every FP8 checkpoint. |
RAWINT4 | Current / Needs smoke | Kimi-style native INT4 path; backend behavior differs by CPU ISA. |
GPTQ_INT4 | Needs smoke | Current inference method, but not a universal INT4 recommendation. |
AMXINT4 | Current | AMX converted INT4 expert weights. |
AMXINT8 | Current | AMX converted INT8 expert weights. |
MXFP4 | Current, narrow | DeepSeek V4-Flash specific. |
LLAMAFILE | Current, secondary | GGUF / llamafile compatibility backend. |
Inference Models
| Model / family | Precision | Entry | Status |
|---|---|---|---|
| DeepSeek V4-Flash | MXFP4 | kt run deepseek-v4-flash | Needs smoke |
| DeepSeek V3.2 | FP8 registry; AMXINT4 tutorial path | kt run deepseek-v3.2 or model tutorial | Needs reconciliation |
| DeepSeek V3-0324 / R1-0528 | AMXINT4 registry default | kt run deepseek-v3 / kt run deepseek-r1 | Current / Needs docs |
| Kimi K2 Thinking | RAWINT4 | kt run kimi-k2-thinking | Current / Needs smoke |
| Kimi K2.5 | RAWINT4 | Manual SGLang-KT tutorial | Needs smoke |
| MiniMax M2 / M2.1 | FP8 | kt run m2 / kt run m2.1 | Current / Needs smoke |
| MiniMax M2.5 | FP8 | Manual SGLang-KT tutorial | Needs smoke |
| Qwen3 / Qwen3.5 / Qwen3-Coder-Next | BF16, FP8, GPTQ_INT4 examples | Model tutorials and AVX2 docs | Needs smoke |
| GLM-5 / GLM-5.1 | BF16, FP8, FP8_PERCHANNEL | Model tutorials | Needs smoke |
| Ascend NPU old pages | Old server or local_chat paths | Historical docs | Not current support |
| Intel xPU old pages | Old server or old Docker paths | Historical docs | Not current support |
| ROCm old pages | Old local_chat paths | Historical docs | Legacy |
| AMD CPU path | AMD BLIS / CPU-side path | Hardware-specific docs after machine arrives | Supported direction / Needs AMD validation |
Fine-Tuning Backends
Current KT SFT means MoE LoRA SFT through LLaMA-Factory. The KT backend name is about CPU expert execution, not global training mixed precision.
| KT backend | Actual method | Status | Notes |
|---|---|---|---|
AMXBF16 | AMXBF16_SFT | Current | Uses BF16 expert checkpoints. |
AMXINT8 | AMXINT8_SFT | Current | Uses prepared INT8 expert weights. |
AMXINT4 | AMXINT4_SFT | Current / Needs smoke | Document the weight preparation path. |
AMX*_SkipLoRA | SkipLoRA SFT variants | Advanced | Not the default quick-start path. |
AMXINT4_1* / KGroup* | Historical enum-level names | Do not advertise | Not exposed by the current public SFT backend map. |
Fine-Tuning Models
| Model / family | Example | Backend | Status |
|---|---|---|---|
| DeepSeek V2 Lite | deepseek_v2_lora_sft_kt.yaml | AMXBF16, AMXINT8, AMXINT4 | Current / Needs smoke |
| DeepSeek V3-0324 BF16 | deepseek_v3_lora_sft_kt.yaml | AMXBF16, AMXINT8, AMXINT4 | Current / Needs smoke |
| Qwen3-235B-A22B | qwen3moe_lora_sft_kt.yaml | AMXBF16, AMXINT8, AMXINT4 | Current / Needs smoke |
| Qwen3.5-397B-A17B | qwen3_5moe_lora_sft_kt.yaml | AMXINT8 first, BF16/INT4 as applicable | Needs smoke |
| Kimi K2 / K2.5 SFT | Old Kimi SFT guide | Old branch or optimize-rule path | Not current support |
| DPO | Old DPO tutorial | Historical path | Unconfirmed / Needs validation |
Current Validation Evidence
The following evidence exists on sapphire4-style local cluster state and can inform wording, but it is not a substitute for a fresh model-specific release gate:
| Area | Evidence |
|---|---|
| Hardware preflight | kt doctor passes on sapphire4-class hardware: 8 x NVIDIA GeForce RTX 4090, Intel Xeon Platinum 8488C, AMX/AVX512, 2 NUMA nodes. Disk warning remains for the default home model path. |
| CLI preflight | kt --help and kt version work in the inference env. kt model list has no models registered in that user config, so registry aliases still need per-environment setup. |
| SFT environment | kt-post1-sglangpost2-finalgate-py312-20260430 imports torch 2.9.1+cu130, kt_kernel 0.6.1.post1, ktransformers 0.6.1.post1, and KT Accelerate plugin. |
| Qwen SFT | Historical Qwen3-235B smoke reached global_step=2, train_loss=2.0366, with adapter artifacts. Historical Qwen3.5 smoke reached global_step=2, train_loss=11.4336, with adapter artifacts. |
| DeepSeek SFT | Historical DeepSeek V2 BF16 smoke reached global_step=1, train_loss=2.5078. |
| DeepSeek inference | Historical DeepSeek-V3 SGLang-KT smoke returned HTTP 200 for /model_info and /generate. |
| Kimi inference | Existing Kimi-K2.5 services respond to /model_info, but generation quality still needs a fresh smoke. |
| MiniMax / GLM | Model directories exist, but no same-level current smoke result has been found yet. |
Documentation Rule
Write support claims as exact tuples:
model family + checkpoint + KT method/backend + hardware class + serving/training entry + package/version caveat
Do not promote a historical tutorial to current support unless the entry path exists in the current source tree and the runtime smoke has been recorded.
Related pages: