Qwen SFT

Qwen MoE SFT is part of the current KTransformers SFT direction through LLaMA-Factory. Model-level Training TPS, commands, and blockers are tracked on the Qwen model page.

Examples

Model	Example	Validation notes
Qwen3-235B-A22B	`examples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yaml`	Validate `AMXBF16`, then INT8/INT4 if prepared weights exist.
Qwen3.5-397B-A17B	`examples/ktransformers/train_lora/qwen3_5moe_lora_sft_kt.yaml`	Start with `AMXINT8`, then add BF16/INT4 only after separate target-environment validation.

Backend Mapping

Use the same backend language as DeepSeek:

Backend	Weight requirement
`AMXBF16`	BF16 expert checkpoint path.
`AMXINT8`	`kt_weight_path` points to prepared INT8 expert weights.
`AMXINT4`	`kt_weight_path` points to prepared INT4 expert weights; validate separately.

One backend passing validation does not imply that the other two backends are also ready; each route needs matching weight preparation and target-environment validation.

Usage Notes

For INT8/INT4 backends, confirm that kt_weight_path points to prepared expert weights.
Start with a short-step training run to confirm loss logging and adapter outputs.
Model-page Training summaries use end-to-end Training TPS; different hardware and package versions should be validated again.