KTransformers

DeepSeek SFT

DeepSeek is a primary KTransformers SFT family. Current documentation should focus on DeepSeek V2 Lite and DeepSeek V3-family MoE LoRA SFT through LLaMA-Factory.

Examples

ModelExampleBackend notes
DeepSeek V2 Liteexamples/ktransformers/train_lora/deepseek_v2_lora_sft_kt.yamlCandidate for AMXBF16, AMXINT8, and AMXINT4 smoke.
DeepSeek V3-0324examples/ktransformers/train_lora/deepseek_v3_lora_sft_kt.yamlFP8 source checkpoints must be converted or prepared before AMX SFT.

Required Precision Explanation

Every DeepSeek V3 SFT tutorial must include this table:

SFT backendHow to use DeepSeek V3 source weights
AMXBF16Use a BF16 checkpoint. If the original release is FP8, convert it to BF16 first.
AMXINT8Prepare KT INT8 expert weights and point kt_weight_path to that directory.
AMXINT4Prepare KT INT4 expert weights and point kt_weight_path to that directory.

Training mixed precision can still be BF16 while the expert backend is INT8 or INT4.

Validation Status

DeepSeek V2 Lite and DeepSeek V3 SFT should stay at "Current / Needs smoke" until the exact sapphire4 or equivalent runtime tuple is recorded:

LLaMA-Factory commit + Python/torch versions + model path + target backend + conversion path + command + loss trace + adapter outputs