KTransformers Documentation

KTransformers (pronounced as Quick Transformers) is a flexible, Python-centric framework designed to enhance your experience with advanced kernel optimizations and placement/parallelism strategies for Transformers models.

Getting Started

Installation - Install KTransformers for CPU-GPU hybrid MoE inference
Optimization Techniques
- AMX Backend - Learn AMX backend architecture, weight conversion, and SGLang launch flow
- Layerwise Prefill - Learn long-context prefill acceleration principles and tuning strategy