Miles supports both ends of Moonshot’s MoE line: the 1 T-parameter Kimi K2 (Instruct and Thinking variants) at 32 B active per token, and the compact Moonlight 16B-A3B that fits on a single 8× H100 node — handy as a single-node test target before scaling K2 across 16 nodes. K2-Thinking is also the canonical target for INT4 QAT.Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
Variants
Fastest path to train
Moonlight on a single 8× H100 node — the smallest Moonshot recipe and a good MoE smoke test:model_type patch that lets Miles treat K2 as a DeepSeek-V3-shaped architecture).

