KTransformers

LoRA SFT with LLaMA-Factory

KTransformers fine-tuning is driven through LLaMA-Factory. Install LLaMA-Factory first, then install the KT SFT extra from its KT requirements file:

cd /path/to/LLaMA-Factory
pip install -e .
pip install -r requirements/ktransformers.txt

The KT requirements file should contain:

ktransformers[sft]

Installed Components

ktransformers[sft] brings in the KT SFT stack:

PackageRole
ktransformersUser-facing package and KT integration.
kt-kernelCPU expert kernels and SFT backend implementations.
transformers-ktKT-compatible Transformers integration.
accelerate-ktKT-aware Accelerate config support.

It does not install sglang-kt; that package belongs to inference serving.

Example Layout

Current examples live under examples/ktransformers/ in LLaMA-Factory:

examples/ktransformers/train_lora/*.yaml
examples/ktransformers/accelerate/fsdp2_kt_*.yaml

The training YAML enables KT with:

use_kt: true

The Accelerate config selects the KT backend with:

kt_config:
  enabled: true
  kt_backend: AMXBF16

Launch Shape

Use the LLaMA-Factory KT examples as the starting point:

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch \
  --config_file examples/ktransformers/accelerate/fsdp2_kt_int8.yaml \
  src/train.py \
  examples/ktransformers/train_lora/qwen3_5moe_lora_sft_kt.yaml

The global training mixed precision is separate from the KT backend name. Current examples usually use BF16 training while selecting a KT expert backend through kt_config.

Before treating an example as production-ready, record the exact runtime tuple in the Runtime Smoke Checklist.