Fine-Tuning Overview
Fine-tuning is a first-class KTransformers workflow, parallel to inference serving. The current public path focuses on MoE LoRA SFT through LLaMA-Factory: train a small adapter on a local heterogeneous workstation, then serve it through the same local KTransformers inference direction.
The design target is practical workstation ownership of large MoE models. GPU resources handle attention, shared paths, and residual LoRA capacity, while KT CPU expert backends keep large expert weights off the GPU memory cliff.
Where to Start
| Goal | Page |
|---|---|
| Run the first SFT example | First LoRA SFT run |
| Understand the LLaMA-Factory config shape | LoRA SFT with LLaMA-Factory |
Choose AMXBF16, AMXINT8, or AMXINT4 | SFT Backends and Precision |
| Prepare BF16, INT8, or INT4 expert weights | Weight Preparation |
| Fine-tune DeepSeek MoE models | DeepSeek SFT |
| Fine-tune Qwen MoE models | Qwen SFT |
| Check which model tutorials are current | Fine-Tuning Model Tutorials |
| Check old or experimental pages | Legacy and Experimental SFT |
| Check DPO status | DPO Status |
Current Public Surface
The current public fine-tuning path is:
LLaMA-Factory training YAML
+ use_kt: true
+ Accelerate KT config
+ ktransformers[sft]
The public KT SFT backend names are:
| Backend | Role |
|---|---|
AMXBF16 | BF16 expert backend. |
AMXINT8 | INT8 expert backend with prepared KT weights. |
AMXINT4 | INT4 expert backend with prepared KT weights. |
SkipLoRA variants exist for advanced experiments, but they are not the default getting-started path.
What Is Not the Current Path
Do not use old kt-sft, automatic patching, kt_optimize_rule, or old Kimi SFT tutorials as the current public path. Keep them as historical material unless they are rerun and rewritten around the current LLaMA-Factory integration.
For exact model and backend status, see Support Matrix.