KTransformers Documentation
KTransformers (pronounced as Quick Transformers) is a flexible, Python-centric framework designed to enhance your experience with advanced kernel optimizations and placement/parallelism strategies for Transformers models.
Getting Started
- Installation - Install KTransformers for CPU-GPU hybrid MoE inference
- Optimization Techniques
- AMX Backend - Learn AMX backend architecture, weight conversion, and SGLang launch flow
- Layerwise Prefill - Learn long-context prefill acceleration principles and tuning strategy