KTransformers Documentation
KTransformers is a CPU-GPU heterogeneous computing project for efficient large MoE model inference and LoRA fine-tuning. The current documentation is organized around two user paths:
- Inference:
kt-kernel + sglang-kt - Fine-tuning:
ktransformers[sft] + LLaMA-Factory
Getting Started
- Installation - Choose the package set for inference or fine-tuning
- First inference server - Start from
kt runor SGLang-KT - First LoRA SFT run - Start from LLaMA-Factory examples
- Support matrix - Check model, precision, backend, and validation status before using a tutorial
Main References
- Inference - Serving path, methods, and SGLang-KT usage
- Fine-tuning - LLaMA-Factory integration and KT AMX backends
- AMX Backend - AMX architecture, weight conversion, and launch flow
- Layerwise Prefill - Long-context prefill acceleration principles and tuning strategy