SFT Backends and Precision
Current KT SFT uses AMX CPU expert backends. The backend name describes how expert weights run on CPU; it is not the same thing as the global training mixed precision setting.
Backend Summary
| Backend | Use when | Weight requirement |
|---|---|---|
AMXBF16 | You want BF16 expert execution. | BF16 expert checkpoint. |
AMXINT8 | You need lower CPU memory use than BF16. | KT-prepared INT8 expert weights. |
AMXINT4 | You need the most aggressive current KT SFT compression path. | KT-prepared INT4 expert weights. |
DeepSeek V3 FP8 Checkpoints
DeepSeek V3-family public checkpoints are often released as FP8. Current KT SFT does not mean native FP8 SFT. It means one of the AMX SFT backends above:
| Target backend | What to prepare |
|---|---|
AMXBF16 | Convert the original FP8 model to a BF16 checkpoint first, then use the BF16 expert path. |
AMXINT8 | Prepare KT INT8 expert weights from the source checkpoint. |
AMXINT4 | Prepare KT INT4 expert weights from the source checkpoint. |
For BF16, use a validated FP8-to-BF16 conversion path before starting SFT. Existing upstream DeepSeek docs include fp8_cast_bf16.py; KT also has CPU-side expert conversion scripts for AMX INT8/INT4 preparation. Record the exact conversion script, source checkpoint, output checkpoint, and checksum or file count in the experiment notes.
Kimi SFT
Kimi SFT is not part of the current public KT SFT support page. Keep old Kimi SFT material as historical until the current LLaMA-Factory path is implemented and smoke-tested.
DPO
DPO support is not promoted as current KT SFT support yet. See DPO Status before publishing any DPO claim.