KTransformers

Fine-Tuning Overview

Fine-tuning is a first-class KTransformers workflow, parallel to inference serving. The current public path focuses on MoE LoRA SFT through LLaMA-Factory: train a small adapter on a local heterogeneous workstation, then serve it through the same local KTransformers inference direction.

The design target is practical workstation ownership of large MoE models. GPU resources handle attention, shared paths, and residual LoRA capacity, while KT CPU expert backends keep large expert weights off the GPU memory cliff.

Where to Start

GoalPage
Run the first SFT exampleFirst LoRA SFT run
Understand the LLaMA-Factory config shapeLoRA SFT with LLaMA-Factory
Choose AMXBF16, AMXINT8, or AMXINT4SFT Backends and Precision
Prepare BF16, INT8, or INT4 expert weightsWeight Preparation
Fine-tune DeepSeek MoE modelsDeepSeek SFT
Fine-tune Qwen MoE modelsQwen SFT
Check which model tutorials are currentFine-Tuning Model Tutorials
Check old or experimental pagesLegacy and Experimental SFT
Check DPO statusDPO Status

Current Public Surface

The current public fine-tuning path is:

LLaMA-Factory training YAML
  + use_kt: true
  + Accelerate KT config
  + ktransformers[sft]

The public KT SFT backend names are:

BackendRole
AMXBF16BF16 expert backend.
AMXINT8INT8 expert backend with prepared KT weights.
AMXINT4INT4 expert backend with prepared KT weights.

SkipLoRA variants exist for advanced experiments, but they are not the default getting-started path.

What Is Not the Current Path

Do not use old kt-sft, automatic patching, kt_optimize_rule, or old Kimi SFT tutorials as the current public path. Keep them as historical material unless they are rerun and rewritten around the current LLaMA-Factory integration.

For exact model and backend status, see Support Matrix.