KTransformers

SFT Backends and Precision

Current KT SFT uses AMX CPU expert backends. The backend name describes how expert weights run on CPU; it is not the same thing as the global training mixed precision setting.

Backend Summary

BackendUse whenWeight requirement
AMXBF16You want BF16 expert execution.BF16 expert checkpoint.
AMXINT8You need lower CPU memory use than BF16.KT-prepared INT8 expert weights.
AMXINT4You need the most aggressive current KT SFT compression path.KT-prepared INT4 expert weights.

DeepSeek V3 FP8 Checkpoints

DeepSeek V3-family public checkpoints are often released as FP8. Current KT SFT does not mean native FP8 SFT. It means one of the AMX SFT backends above:

Target backendWhat to prepare
AMXBF16Convert the original FP8 model to a BF16 checkpoint first, then use the BF16 expert path.
AMXINT8Prepare KT INT8 expert weights from the source checkpoint.
AMXINT4Prepare KT INT4 expert weights from the source checkpoint.

For BF16, use a validated FP8-to-BF16 conversion path before starting SFT. Existing upstream DeepSeek docs include fp8_cast_bf16.py; KT also has CPU-side expert conversion scripts for AMX INT8/INT4 preparation. Record the exact conversion script, source checkpoint, output checkpoint, and checksum or file count in the experiment notes.

Kimi SFT

Kimi SFT is not part of the current public KT SFT support page. Keep old Kimi SFT material as historical until the current LLaMA-Factory path is implemented and smoke-tested.

DPO

DPO support is not promoted as current KT SFT support yet. See DPO Status before publishing any DPO claim.