Weight Preparation
KTransformers SFT backend selection and model weight format must match. Do not treat AMXBF16, AMXINT8, and AMXINT4 as interchangeable flags on the same checkpoint directory.
Preparation Matrix
| Target backend | Input checkpoint | Output expected by training |
|---|---|---|
AMXBF16 | BF16 expert checkpoint. | The model path points to BF16 experts. |
AMXINT8 | Prefer BF16 or a validated source precision. FP8 input may require extra accuracy checks. | kt_weight_path points to prepared INT8 expert weights. |
AMXINT4 | Prefer BF16 or a validated source precision. FP8 input is more aggressive and needs smoke. | kt_weight_path points to prepared INT4 expert weights. |
The source checkpoint, target backend, conversion command, output directory, and file count should be recorded together. If the source model is FP8, record whether the route was FP8 -> BF16 -> SFT or FP8 -> KT INT8/INT4 expert preparation.
DeepSeek V3 FP8 Source Weights
DeepSeek V3-family SFT tutorials must explicitly explain the three routes:
| Route | Documentation wording |
|---|---|
FP8 -> BF16 -> AMXBF16 | Convert the FP8 checkpoint to BF16 first, then use the BF16 KT SFT backend. |
FP8/BF16 -> AMXINT8 | Prepare KT INT8 expert weights and set kt_weight_path. Prefer BF16 source weights when available. |
FP8/BF16 -> AMXINT4 | Prepare KT INT4 expert weights and set kt_weight_path. Treat this as a higher-compression route that needs a separate smoke record. |
Do not describe this as native FP8 fine-tuning. The current public KT SFT backends are AMX BF16/INT8/INT4.
Smoke Record
For each prepared weight directory, capture:
- source model path and revision
- conversion script and command
- target backend
- output path and file count
- training YAML and Accelerate config
- first training steps, loss values, and adapter output files