KTransformers

KTransformers Documentation

KTransformers is a CPU-GPU heterogeneous computing project for large MoE model inference and LoRA fine-tuning. The documentation follows a task-first structure: install the right package path, choose inference or fine-tuning, run a model, then move into model-specific tutorials, technical background, hardware boundaries, and command references.

Current Public Surface

TaskPublic packagesPrimary entry
Inference servingkt-kernel sglang-ktkt run or python -m sglang.launch_server with --kt-* arguments
LoRA SFTktransformers[sft] through LLaMA-FactoryLLaMA-Factory training YAML with use_kt: true and an Accelerate KT config

Older local_chat.py, ktransformers/server/main.py, balance_serve, and kt_optimize_rule paths are historical unless a page explicitly marks them as revalidated.

Getting Started

Inference

Fine-Tuning

Advanced Features

Supported Models and Platforms

Technical Work

Developer and Command Reference