Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

Miles is a high-performance, enterprise-ready reinforcement learning (RL) framework specifically optimized for Large-Scale model Post-Training. It couples SGLang for high-throughput rollout with Megatron-LM for scalable training, and ships the precision, stability, and observability features needed to run RL at trillion-parameter scale. “A journey of a thousand miles begins with a single rollout.” — Miles focuses on the low-level system optimizations that make large-scale RL stable, efficient, and reproducible.

Core features

  • Fast and stable support for the latest models. Day-0 enablement of frontier releases such as DeepSeek-V4, with rapid follow-on support for new architectures including GLM-5, Qwen 3.6, and Nemotron-3-Super.
  • Unified low-precision training. Customizable precision across the rollout and training engines, with unified BF16, FP8, MXFP8, and INT4 QAT recipes available now and an NVFP4 training recipe in progress.
  • Efficient Rollout Routing Replay (R3). For MoE models, expert routing captured during inference is replayed during the trainer’s forward pass, eliminating the mismatch that destabilizes large-scale MoE RL. Optimized with a routing-result cache and overlapped device-to-host (D2H) copy to reduce overhead in both single-turn and multi-turn RL.
  • Speculative rollout with online MTP-SFT. Miles keeps the draft model’s acceptance rate high through training by fine-tuning MTP layers on-policy.
  • LoRA training and serving. Both SFT and RL recipes support LoRA adapters, and the same adapters load directly into SGLang for rollout — no separate merge or conversion step.
  • Native agentic rollout. Tool use, multi-turn dialogue, search, code execution, and multi-agent co-evolution are all supported through clean Python extension points.
  • Minimal core, maximal extension. Twenty-plus plug-points let you replace the rollout, reward, loss, or filter without forking the trainer.
  • Broad hardware support. First-class on NVIDIA Hopper (H100, H200) and Blackwell (B100, B200, GB200, GB300), with AMD MI300X / MI325 / MI350 / MI355X also supported via ROCm.

Supported models

Each model name links to its recipe page. See Models for exact conversion commands, launch scripts, and parallelism settings.

Supported hardware

  • NVIDIA: GB300, GB200, B200, B100, H200, H100, A100.
  • AMD: MI300X, MI325, MI350, MI355X (via ROCm).
See Platforms.

Latest updates

Start here

  1. Installation — Docker, bare metal, AMD.
  2. Quick Start — a working training run in under an hour.
  3. Core concepts — the four objects in every Miles job.
  4. Training backend — Megatron-LM, parallelism, checkpoints, and hooks.
  5. Training script walkthrough — every argument group in a launch script, annotated.

Contribute