KTransformers

Support Matrix

Checked on: 2026-05-10 Asia/Shanghai.

Use this matrix before treating a tutorial as current support. A model is supported only when the model family, checkpoint, KT method/backend, hardware class, and serving or training entry all match.

For the documentation policy behind these labels, see Model Status Policy. For validation steps, see Runtime Smoke Checklist.

Status

StatusMeaning
CurrentCode entry exists and the documented interface matches the current repo.
Current, narrowSupported under explicit model, hardware, package, or backend constraints.
Needs smokeCode or docs exist, but the exact runtime tuple should be rerun before production claims.
Needs reconciliationMultiple current-looking paths exist and must be unified before the website makes one recommendation.
Not current supportDo not advertise as a current KTransformers capability.
Supported directionThe project direction is valid, but model-specific docs need hardware validation.
LegacyDepends on old local_chat.py, ktransformers/server/main.py, balance_serve, or kt_optimize_rule paths.

Inference Methods

MethodStatusNotes
BF16CurrentNative BF16 MoE expert inference for documented model paths.
FP8CurrentNative FP8 path used by DeepSeek, MiniMax, Qwen, and GLM-style pages.
FP8_PERCHANNELCurrent, narrowPer-channel FP8; do not generalize to every FP8 checkpoint.
RAWINT4Current / Needs smokeKimi-style native INT4 path; backend behavior differs by CPU ISA.
GPTQ_INT4Needs smokeCurrent inference method, but not a universal INT4 recommendation.
AMXINT4CurrentAMX converted INT4 expert weights.
AMXINT8CurrentAMX converted INT8 expert weights.
MXFP4Current, narrowDeepSeek V4-Flash specific.
LLAMAFILECurrent, secondaryGGUF / llamafile compatibility backend.

Inference Models

Model / familyPrecisionEntryStatus
DeepSeek V4-FlashMXFP4kt run deepseek-v4-flashNeeds smoke
DeepSeek V3.2FP8 registry; AMXINT4 tutorial pathkt run deepseek-v3.2 or model tutorialNeeds reconciliation
DeepSeek V3-0324 / R1-0528AMXINT4 registry defaultkt run deepseek-v3 / kt run deepseek-r1Current / Needs docs
Kimi K2 ThinkingRAWINT4kt run kimi-k2-thinkingCurrent / Needs smoke
Kimi K2.5RAWINT4Manual SGLang-KT tutorialNeeds smoke
MiniMax M2 / M2.1FP8kt run m2 / kt run m2.1Current / Needs smoke
MiniMax M2.5FP8Manual SGLang-KT tutorialNeeds smoke
Qwen3 / Qwen3.5 / Qwen3-Coder-NextBF16, FP8, GPTQ_INT4 examplesModel tutorials and AVX2 docsNeeds smoke
GLM-5 / GLM-5.1BF16, FP8, FP8_PERCHANNELModel tutorialsNeeds smoke
Ascend NPU old pagesOld server or local_chat pathsHistorical docsNot current support
Intel xPU old pagesOld server or old Docker pathsHistorical docsNot current support
ROCm old pagesOld local_chat pathsHistorical docsLegacy
AMD CPU pathAMD BLIS / CPU-side pathHardware-specific docs after machine arrivesSupported direction / Needs AMD validation

Fine-Tuning Backends

Current KT SFT means MoE LoRA SFT through LLaMA-Factory. The KT backend name is about CPU expert execution, not global training mixed precision.

KT backendActual methodStatusNotes
AMXBF16AMXBF16_SFTCurrentUses BF16 expert checkpoints.
AMXINT8AMXINT8_SFTCurrentUses prepared INT8 expert weights.
AMXINT4AMXINT4_SFTCurrent / Needs smokeDocument the weight preparation path.
AMX*_SkipLoRASkipLoRA SFT variantsAdvancedNot the default quick-start path.
AMXINT4_1* / KGroup*Historical enum-level namesDo not advertiseNot exposed by the current public SFT backend map.

Fine-Tuning Models

Model / familyExampleBackendStatus
DeepSeek V2 Litedeepseek_v2_lora_sft_kt.yamlAMXBF16, AMXINT8, AMXINT4Current / Needs smoke
DeepSeek V3-0324 BF16deepseek_v3_lora_sft_kt.yamlAMXBF16, AMXINT8, AMXINT4Current / Needs smoke
Qwen3-235B-A22Bqwen3moe_lora_sft_kt.yamlAMXBF16, AMXINT8, AMXINT4Current / Needs smoke
Qwen3.5-397B-A17Bqwen3_5moe_lora_sft_kt.yamlAMXINT8 first, BF16/INT4 as applicableNeeds smoke
Kimi K2 / K2.5 SFTOld Kimi SFT guideOld branch or optimize-rule pathNot current support
DPOOld DPO tutorialHistorical pathUnconfirmed / Needs validation

Current Validation Evidence

The following evidence exists on sapphire4-style local cluster state and can inform wording, but it is not a substitute for a fresh model-specific release gate:

AreaEvidence
Hardware preflightkt doctor passes on sapphire4-class hardware: 8 x NVIDIA GeForce RTX 4090, Intel Xeon Platinum 8488C, AMX/AVX512, 2 NUMA nodes. Disk warning remains for the default home model path.
CLI preflightkt --help and kt version work in the inference env. kt model list has no models registered in that user config, so registry aliases still need per-environment setup.
SFT environmentkt-post1-sglangpost2-finalgate-py312-20260430 imports torch 2.9.1+cu130, kt_kernel 0.6.1.post1, ktransformers 0.6.1.post1, and KT Accelerate plugin.
Qwen SFTHistorical Qwen3-235B smoke reached global_step=2, train_loss=2.0366, with adapter artifacts. Historical Qwen3.5 smoke reached global_step=2, train_loss=11.4336, with adapter artifacts.
DeepSeek SFTHistorical DeepSeek V2 BF16 smoke reached global_step=1, train_loss=2.5078.
DeepSeek inferenceHistorical DeepSeek-V3 SGLang-KT smoke returned HTTP 200 for /model_info and /generate.
Kimi inferenceExisting Kimi-K2.5 services respond to /model_info, but generation quality still needs a fresh smoke.
MiniMax / GLMModel directories exist, but no same-level current smoke result has been found yet.

Documentation Rule

Write support claims as exact tuples:

model family + checkpoint + KT method/backend + hardware class + serving/training entry + package/version caveat

Do not promote a historical tutorial to current support unless the entry path exists in the current source tree and the runtime smoke has been recorded.

Related pages: