Qwen SFT
Qwen MoE SFT is part of the current KTransformers SFT direction through LLaMA-Factory. Use the current examples as the source of truth, then publish only the backend routes that have been smoke-tested.
Examples
| Model | Example | Recommended publication order |
|---|---|---|
| Qwen3-235B-A22B | examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml | Validate AMXBF16, then INT8/INT4 if prepared weights exist. |
| Qwen3.5-397B-A17B | examples/ktransformers/train_lora/qwen3_5moe_lora_sft_kt.yaml | Start with AMXINT8, then add BF16/INT4 only after separate smoke records. |
Backend Wording
Use the same backend language as DeepSeek:
| Backend | Documentation rule |
|---|---|
AMXBF16 | BF16 expert checkpoint path. |
AMXINT8 | kt_weight_path points to prepared INT8 expert weights. |
AMXINT4 | kt_weight_path points to prepared INT4 expert weights; validate separately. |
Do not imply that a model tutorial supports all three backends unless all three routes have documented weight preparation and smoke results.
Validation Status
Qwen SFT pages should record:
- model checkpoint and revision
- training YAML
- Accelerate config
kt_configkt_weight_pathif using INT8 or INT4- first training steps and adapter output files