Weight Preparation

KTransformers SFT backend selection and model weight format must match. Do not treat AMXBF16, AMXINT8, and AMXINT4 as interchangeable flags on the same checkpoint directory.

Preparation Matrix

Target backend	Input checkpoint	Output expected by training
`AMXBF16`	BF16 expert checkpoint.	The model path points to BF16 experts.
`AMXINT8`	Prefer BF16 or a validated source precision. FP8 input may require extra accuracy checks.	`kt_weight_path` points to prepared INT8 expert weights.
`AMXINT4`	Prefer BF16 or a validated source precision. FP8 input is more aggressive and needs smoke.	`kt_weight_path` points to prepared INT4 expert weights.

The source checkpoint, target backend, conversion command, output directory, and file count should be recorded together. If the source model is FP8, record whether the route was FP8 -> BF16 -> SFT or FP8 -> KT INT8/INT4 expert preparation.

DeepSeek V3 FP8 Source Weights

DeepSeek V3-family SFT tutorials must explicitly explain the three routes:

Route	Documentation wording
FP8 -> BF16 -> `AMXBF16`	Convert the FP8 checkpoint to BF16 first, then use the BF16 KT SFT backend.
FP8/BF16 -> `AMXINT8`	Prepare KT INT8 expert weights and set `kt_weight_path`. Prefer BF16 source weights when available.
FP8/BF16 -> `AMXINT4`	Prepare KT INT4 expert weights and set `kt_weight_path`. Treat this as a higher-compression route that needs separate target-environment validation.

Do not describe this as native FP8 fine-tuning. The current public KT SFT backends are AMX BF16/INT8/INT4.

Smoke Record

For each prepared weight directory, capture:

source model path and revision
conversion script and command
target backend
output path and file count
training YAML and Accelerate config
first training steps, loss values, and adapter output files