Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

Miles supports both ends of Moonshot’s MoE line: the 1 T-parameter Kimi K2 (Instruct and Thinking variants) at 32 B active per token, and the compact Moonlight 16B-A3B that fits on a single 8× H100 node — handy as a single-node test target before scaling K2 across 16 nodes. K2-Thinking is also the canonical target for INT4 QAT.

Variants

ModelActive / TotalHF IDRecipe
Kimi-K2-Instruct32 B / 1 Tmoonshotai/Kimi-K2-Instructkimi-k2
Kimi-K2-Thinking32 B / 1 Tmoonshotai/Kimi-K2-Thinkingkimi-k2
Moonlight-16B-A3B3 B / 16 Bmoonshotai/Moonlight-16B-A3Bmoonlight

Fastest path to train

Moonlight on a single 8× H100 node — the smallest Moonshot recipe and a good MoE smoke test:
cd /root/miles
hf download moonshotai/Moonlight-16B-A3B --local-dir /root/Moonlight-16B-A3B
bash scripts/run-moonlight-16B-A3B.sh
See the Moonlight page for the full walkthrough, or Kimi K2 for the 16-node K2-Thinking recipe (including the one-line model_type patch that lets Miles treat K2 as a DeepSeek-V3-shaped architecture).

Which variant do I pick?

  • Single-node MoE smoke test → Moonlight-16B-A3B (moonlight).
  • Frontier-scale instruction-tuned MoE → Kimi-K2-Instruct (kimi-k2).
  • Reasoning-style training, INT4 QAT target → Kimi-K2-Thinking (kimi-k2).