Precision and Quantization
KTransformers method names describe the KT expert execution path and the expected weight format. They are not interchangeable labels.
Method Summary
| Method | Weight format / backend | Use when |
|---|---|---|
BF16 | Native BF16 expert weights | Quality and simplicity are preferred, and hardware supports the path. |
FP8 | Native FP8 expert weights | The model family provides a compatible FP8 checkpoint. |
FP8_PERCHANNEL | Per-channel FP8 | The exact model page says per-channel FP8 is required. |
RAWINT4 | Native INT4 weights shared by CPU/GPU paths | Kimi-style native INT4 model paths. |
GPTQ_INT4 | GPTQ INT4 checkpoint path | Use only for examples that explicitly document GPTQ_INT4. |
AMXINT8 | Converted AMX INT8 expert weights | Intel AMX CPU path with prepared CPU weights. |
AMXINT4 | Converted AMX INT4 expert weights | Intel AMX CPU path where INT4 quality/performance is acceptable. |
MXFP4 | DeepSeek V4-Flash native MXFP4 | Narrow model-specific path. |
LLAMAFILE | GGUF / llamafile backend | Compatibility-oriented CPU backend. |
Weight Path Rule
--kt-weight-path must point to the CPU-side weights expected by the selected method:
BF16,FP8,RAWINT4: often the same native checkpoint or a model-specific native weight path.AMXINT8,AMXINT4: converted KT CPU expert weights.LLAMAFILE: GGUF weight directory.MXFP4: exact DeepSeek V4-Flash weight layout.
Conversion Boundary
Do not claim a quantized path is supported only because a conversion script exists. Public documentation should also identify:
- source checkpoint format
- conversion command and output layout
- selected
--kt-method - CPU ISA requirement
- smoke result or "Needs smoke" status