SFT Backends and Precision

Current KT SFT uses AMX CPU expert backends. The backend name describes how expert weights run on CPU; it is not the same thing as the global training mixed precision setting.

Backend Summary

Backend	Use when	Weight requirement
`AMXBF16`	You want BF16 expert execution.	BF16 expert checkpoint.
`AMXINT8`	You need lower CPU memory use than BF16.	KT-prepared INT8 expert weights.
`AMXINT4`	You need the most aggressive current KT SFT compression path.	KT-prepared INT4 expert weights.

DeepSeek V3 FP8 Checkpoints

DeepSeek V3-family public checkpoints are often released as FP8. Current KT SFT does not mean native FP8 SFT. It means one of the AMX SFT backends above:

Target backend	What to prepare
`AMXBF16`	Convert the original FP8 model to a BF16 checkpoint first, then use the BF16 expert path.
`AMXINT8`	Prepare KT INT8 expert weights from the source checkpoint.
`AMXINT4`	Prepare KT INT4 expert weights from the source checkpoint.

For BF16, use a validated FP8-to-BF16 conversion path before starting SFT. Existing upstream DeepSeek docs include fp8_cast_bf16.py; KT also has CPU-side expert conversion scripts for AMX INT8/INT4 preparation. Record the exact conversion script, source checkpoint, output checkpoint, and checksum or file count in the experiment notes.

Kimi SFT

Kimi SFT is not part of the current public KT SFT support page. Keep old Kimi SFT material as historical until the current LLaMA-Factory path is implemented and smoke-tested.

DPO

DPO is not part of the current KT SFT support scope yet. See DPO Status.