DeepSeek SFT
DeepSeek is a primary KTransformers SFT family. Current documentation should focus on DeepSeek V2 Lite and DeepSeek V3-family MoE LoRA SFT through LLaMA-Factory.
Examples
| Model | Example | Backend notes |
|---|---|---|
| DeepSeek V2 Lite | examples/ktransformers/train_lora/deepseek_v2_lora_sft_kt.yaml | Candidate for AMXBF16, AMXINT8, and AMXINT4 smoke. |
| DeepSeek V3-0324 | examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yaml | FP8 source checkpoints must be converted or prepared before AMX SFT. |
Required Precision Explanation
Every DeepSeek V3 SFT tutorial must include this table:
| SFT backend | How to use DeepSeek V3 source weights |
|---|---|
AMXBF16 | Use a BF16 checkpoint. If the original release is FP8, convert it to BF16 first. |
AMXINT8 | Prepare KT INT8 expert weights and point kt_weight_path to that directory. |
AMXINT4 | Prepare KT INT4 expert weights and point kt_weight_path to that directory. |
Training mixed precision can still be BF16 while the expert backend is INT8 or INT4.
Validation Status
DeepSeek V2 Lite and DeepSeek V3 SFT should stay at "Current / Needs smoke" until the exact sapphire4 or equivalent runtime tuple is recorded:
LLaMA-Factory commit + Python/torch versions + model path + target backend + conversion path + command + loss trace + adapter outputs