KTransformers

KTransformers Documentation

KTransformers is a CPU-GPU heterogeneous computing project for efficient large MoE model inference and LoRA fine-tuning. The current documentation is organized around two user paths:

  • Inference: kt-kernel + sglang-kt
  • Fine-tuning: ktransformers[sft] + LLaMA-Factory

Getting Started

Main References

  • Inference - Serving path, methods, and SGLang-KT usage
  • Fine-tuning - LLaMA-Factory integration and KT AMX backends
  • AMX Backend - AMX architecture, weight conversion, and launch flow
  • Layerwise Prefill - Long-context prefill acceleration principles and tuning strategy