DeepSeek SFT

DeepSeek is a primary KTransformers SFT family. Model-level Training TPS, commands, and blockers are tracked on the DeepSeek model page; this page explains cross-cutting SFT notes.

Examples

Model	Example	Backend notes
DeepSeek V2 Lite	`examples/ktransformers/train_lora/deepseek_v2_lora_sft_kt.yaml`	Candidate for `AMXBF16`, `AMXINT8`, and `AMXINT4` smoke.
DeepSeek V3-0324	`examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yaml`	FP8 source checkpoints must be converted or prepared before AMX SFT.

Precision Mapping

DeepSeek V3 SFT needs a clear mapping between source weights and target backend:

SFT backend	How to use DeepSeek V3 source weights
`AMXBF16`	Use a BF16 checkpoint. If the original release is FP8, convert it to BF16 first.
`AMXINT8`	Prepare KT INT8 expert weights and point `kt_weight_path` to that directory.
`AMXINT4`	Prepare KT INT4 expert weights and point `kt_weight_path` to that directory.

Training mixed precision can still be BF16 while the expert backend is INT8 or INT4.

Validation Status

DeepSeek V2 Lite and DeepSeek V3 SFT examples should keep the runtime configuration explicit:

LLaMA-Factory commit + Python/torch versions + model path + target backend + conversion path + command + loss trace + adapter outputs