Precision and Quantization

KTransformers method names describe the KT expert execution path and the expected weight format. They are not interchangeable labels.

Method Summary

Method	Weight format / backend	Use when
`BF16`	Native BF16 expert weights	Quality and simplicity are preferred, and hardware supports the path.
`FP8`	Native FP8 expert weights	The model family provides a compatible FP8 checkpoint.
`FP8_PERCHANNEL`	Per-channel FP8	The exact model page says per-channel FP8 is required.
`RAWINT4`	Native INT4 weights shared by CPU/GPU paths	Kimi-style native INT4 model paths.
`GPTQ_INT4`	GPTQ INT4 checkpoint path	Use only for examples that explicitly document GPTQ_INT4.
`AMXINT8`	Converted AMX INT8 expert weights	Intel AMX CPU path with prepared CPU weights.
`AMXINT4`	Converted AMX INT4 expert weights	Intel AMX CPU path where INT4 quality/performance is acceptable.
`MXFP4`	DeepSeek V4-Flash native MXFP4	Narrow model-specific path.
`LLAMAFILE`	GGUF / llamafile backend	Compatibility-oriented CPU backend.

--kt-weight-path needs to point to the CPU-side weights expected by the selected method:

BF16, FP8, RAWINT4: often the same native checkpoint or a model-specific native weight path.
AMXINT8, AMXINT4: converted KT CPU expert weights.
LLAMAFILE: GGUF weight directory.
MXFP4: exact DeepSeek V4-Flash weight layout.

Having a conversion script does not automatically make a quantized path ready to use. Check these details together: