KTransformers

Precision and Quantization

KTransformers method names describe the KT expert execution path and the expected weight format. They are not interchangeable labels.

Method Summary

MethodWeight format / backendUse when
BF16Native BF16 expert weightsQuality and simplicity are preferred, and hardware supports the path.
FP8Native FP8 expert weightsThe model family provides a compatible FP8 checkpoint.
FP8_PERCHANNELPer-channel FP8The exact model page says per-channel FP8 is required.
RAWINT4Native INT4 weights shared by CPU/GPU pathsKimi-style native INT4 model paths.
GPTQ_INT4GPTQ INT4 checkpoint pathUse only for examples that explicitly document GPTQ_INT4.
AMXINT8Converted AMX INT8 expert weightsIntel AMX CPU path with prepared CPU weights.
AMXINT4Converted AMX INT4 expert weightsIntel AMX CPU path where INT4 quality/performance is acceptable.
MXFP4DeepSeek V4-Flash native MXFP4Narrow model-specific path.
LLAMAFILEGGUF / llamafile backendCompatibility-oriented CPU backend.

Weight Path Rule

--kt-weight-path must point to the CPU-side weights expected by the selected method:

  • BF16, FP8, RAWINT4: often the same native checkpoint or a model-specific native weight path.
  • AMXINT8, AMXINT4: converted KT CPU expert weights.
  • LLAMAFILE: GGUF weight directory.
  • MXFP4: exact DeepSeek V4-Flash weight layout.

Conversion Boundary

Do not claim a quantized path is supported only because a conversion script exists. Public documentation should also identify:

  • source checkpoint format
  • conversion command and output layout
  • selected --kt-method
  • CPU ISA requirement
  • smoke result or "Needs smoke" status