KTransformers

CPU/GPU Requirements

KTransformers serving depends on both GPU memory and CPU expert throughput. A usable setup is a tuple of GPU, CPU ISA, memory capacity, NUMA layout, model checkpoint, and KT method.

Baseline Requirements

ComponentGuidance
OSLinux x86-64 for current public packages.
PythonPython 3.10, 3.11, or 3.12 for kt-kernel wheels.
GPUNVIDIA Ampere or newer is the current main path for serving.
CPUAVX2 minimum for compatibility paths; AVX512 or AMX for higher-throughput paths.
MemoryLarge MoE models need high system RAM; size depends on method and CPU weight format.
NUMAMulti-socket systems should tune --kt-threadpool-count and CPU placement.

Current Hardware Scope

PlatformWebsite status
NVIDIA GPU + x86 CPUMain documented path. sapphire4-style systems are the current validation target for NVIDIA/AMX workflows.
AMD CPU pathSupported direction; publish only after AMD hardware validation records exact tuples.
Ascend NPUNot current public support.
Intel xPUNot current public support.
ROCmHistorical documentation only until a current package path is validated.

Planning Rule

Start from the model support tuple:

model + method + CPU backend + GPU count + system RAM + package versions

Then tune:

  • --kt-cpuinfer
  • --kt-threadpool-count
  • --kt-num-gpu-experts
  • prefill threshold and deferred experts if applicable

Production Claim Boundary

Do not write "supports hardware X" unless at least one model/method tuple has been smoke-tested on that hardware class. Hardware support should be specific, not generic.