DeepSeek SFT

DeepSeek 是 KTransformers SFT 的核心模型家族。模型级 Training TPS、启动命令和 blocker 以 DeepSeek 模型页为准；本页只解释 DeepSeek SFT 的横向注意事项。

当前验证结论

模型	模型页 Training 状态
DeepSeek-V2-Lite	BF16/KT LoRA SFT 12-step 通过，`Training TPS=23.32`；带 fused expert LoRA caveat
DeepSeek V4-Flash	Training preflight 失败：当前训练栈不识别 `deepseek_v4`
DeepSeek-V3.2	Training preflight 失败：当前训练栈不识别 `deepseek_v32`

模型	示例	Backend 备注
DeepSeek V2 Lite	`examples/ktransformers/train_lora/deepseek_v2_lora_sft_kt.yaml` 方向	通过的是 BF16/KT + fused expert LoRA caveated route，不是 upstream-clean route。
DeepSeek V3 / V3.2 / V4 系列	`examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yaml` 方向	当前 V4/V3.2 的 preflight 已失败；升级训练栈前没有推荐命令。

DeepSeek V3 SFT 需要先确认源权重和目标 backend 的对应关系：

SFT backend	DeepSeek V3 源权重应如何使用
`AMXBF16`	使用 BF16 checkpoint。如果原始发布是 FP8，需要先转成 BF16。
`AMXINT8`	准备 KT INT8 expert 权重，并让 `kt_weight_path` 指向该目录。
`AMXINT4`	准备 KT INT4 expert 权重，并让 `kt_weight_path` 指向该目录。

训练 mixed precision 仍可以是 BF16，即便 expert backend 选择 INT8 或 INT4。

DeepSeek Training 的主表数据会保留：

LLaMA-Factory commit + Python/torch 版本 + 模型路径 + 目标 backend + command.sh + YAML + loss 轨迹 + adapter 产物

DeepSeek-V2-Lite 当前 caveat 不能删除：no-shim/no-force 已复测失败，no-force+shim 仍卡在 expert LoRA target mismatch。