KTransformers

Weight Preparation

KTransformers SFT backend selection and model weight format must match. Do not treat AMXBF16, AMXINT8, and AMXINT4 as interchangeable flags on the same checkpoint directory.

Preparation Matrix

Target backendInput checkpointOutput expected by training
AMXBF16BF16 expert checkpoint.The model path points to BF16 experts.
AMXINT8Prefer BF16 or a validated source precision. FP8 input may require extra accuracy checks.kt_weight_path points to prepared INT8 expert weights.
AMXINT4Prefer BF16 or a validated source precision. FP8 input is more aggressive and needs smoke.kt_weight_path points to prepared INT4 expert weights.

The source checkpoint, target backend, conversion command, output directory, and file count should be recorded together. If the source model is FP8, record whether the route was FP8 -> BF16 -> SFT or FP8 -> KT INT8/INT4 expert preparation.

DeepSeek V3 FP8 Source Weights

DeepSeek V3-family SFT tutorials must explicitly explain the three routes:

RouteDocumentation wording
FP8 -> BF16 -> AMXBF16Convert the FP8 checkpoint to BF16 first, then use the BF16 KT SFT backend.
FP8/BF16 -> AMXINT8Prepare KT INT8 expert weights and set kt_weight_path. Prefer BF16 source weights when available.
FP8/BF16 -> AMXINT4Prepare KT INT4 expert weights and set kt_weight_path. Treat this as a higher-compression route that needs a separate smoke record.

Do not describe this as native FP8 fine-tuning. The current public KT SFT backends are AMX BF16/INT8/INT4.

Smoke Record

For each prepared weight directory, capture:

  • source model path and revision
  • conversion script and command
  • target backend
  • output path and file count
  • training YAML and Accelerate config
  • first training steps, loss values, and adapter output files