KTransformers

Qwen SFT

Qwen MoE SFT is part of the current KTransformers SFT direction through LLaMA-Factory. Use the current examples as the source of truth, then publish only the backend routes that have been smoke-tested.

Examples

ModelExampleRecommended publication order
Qwen3-235B-A22Bexamples/ktransformers/train_lora/qwen3moe_lora_sft_kt.yamlValidate AMXBF16, then INT8/INT4 if prepared weights exist.
Qwen3.5-397B-A17Bexamples/ktransformers/train_lora/qwen3_5moe_lora_sft_kt.yamlStart with AMXINT8, then add BF16/INT4 only after separate smoke records.

Backend Wording

Use the same backend language as DeepSeek:

BackendDocumentation rule
AMXBF16BF16 expert checkpoint path.
AMXINT8kt_weight_path points to prepared INT8 expert weights.
AMXINT4kt_weight_path points to prepared INT4 expert weights; validate separately.

Do not imply that a model tutorial supports all three backends unless all three routes have documented weight preparation and smoke results.

Validation Status

Qwen SFT pages should record:

  • model checkpoint and revision
  • training YAML
  • Accelerate config
  • kt_config
  • kt_weight_path if using INT8 or INT4
  • first training steps and adapter output files