Fine-Tuning Overview

Fine-tuning is a first-class KTransformers workflow, parallel to inference serving. The current public path focuses on MoE LoRA SFT through LLaMA-Factory: train a small adapter on a local heterogeneous workstation, then serve it through the same local KTransformers inference direction.

The design target is practical workstation ownership of large MoE models. GPU resources handle attention, shared paths, and residual LoRA capacity, while KT CPU expert backends keep large expert weights off the GPU memory cliff.

Where to Start

Goal	Page
Run the first SFT example	First LoRA SFT run
Understand the LLaMA-Factory config shape	LoRA SFT with LLaMA-Factory
Choose `AMXBF16`, `AMXINT8`, or `AMXINT4`	SFT Backends and Precision
Prepare BF16, INT8, or INT4 expert weights	Weight Preparation
Fine-tune DeepSeek MoE models	DeepSeek SFT
Fine-tune Qwen MoE models	Qwen SFT
Check which model tutorials are current	Fine-Tuning Model Tutorials
Check old or experimental pages	Legacy and Experimental SFT
Check DPO status	DPO Status

Current Public Surface

The current public fine-tuning path is:

LLaMA-Factory training YAML
  + use_kt: true
  + Accelerate KT config
  + ktransformers[sft]

The public KT SFT backend names are:

Backend	Role
`AMXBF16`	BF16 expert backend.
`AMXINT8`	INT8 expert backend with prepared KT weights.
`AMXINT4`	INT4 expert backend with prepared KT weights.

SkipLoRA variants exist for advanced experiments, but they are not the default getting-started path.

What Is Not the Current Path

Do not use old kt-sft, automatic patching, kt_optimize_rule, or old Kimi SFT tutorials as the current public path. Keep them as historical material unless they are rerun and rewritten around the current LLaMA-Factory integration.

For exact model and backend status, see Support Matrix.