Miles is a high-performance, enterprise-ready reinforcement learning (RL) framework specifically optimized for Large-Scale model Post-Training. It couples SGLang for high-throughput rollout with Megatron-LM for scalable training, and ships the precision, stability, and observability features needed to run RL at trillion-parameter scale. “A journey of a thousand miles begins with a single rollout.” — Miles focuses on the low-level system optimizations that make large-scale RL stable, efficient, and reproducible.Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
Core features
- Fast and stable support for the latest models. Day-0 enablement of frontier releases such as DeepSeek-V4, with rapid follow-on support for new architectures including GLM-5, Qwen 3.6, and Nemotron-3-Super.
- Unified low-precision training. Customizable precision across the rollout and training engines, with unified BF16, FP8, MXFP8, and INT4 QAT recipes available now and an NVFP4 training recipe in progress.
- Efficient Rollout Routing Replay (R3). For MoE models, expert routing captured during inference is replayed during the trainer’s forward pass, eliminating the mismatch that destabilizes large-scale MoE RL. Optimized with a routing-result cache and overlapped device-to-host (D2H) copy to reduce overhead in both single-turn and multi-turn RL.
- Speculative rollout with online MTP-SFT. Miles keeps the draft model’s acceptance rate high through training by fine-tuning MTP layers on-policy.
- LoRA training and serving. Both SFT and RL recipes support LoRA adapters, and the same adapters load directly into SGLang for rollout — no separate merge or conversion step.
- Native agentic rollout. Tool use, multi-turn dialogue, search, code execution, and multi-agent co-evolution are all supported through clean Python extension points.
- Minimal core, maximal extension. Twenty-plus plug-points let you replace the rollout, reward, loss, or filter without forking the trainer.
- Broad hardware support. First-class on NVIDIA Hopper (H100, H200) and Blackwell (B100, B200, GB200, GB300), with AMD MI300X / MI325 / MI350 / MI355X also supported via ROCm.
Supported models
Each model name links to its recipe page.Supported hardware
- NVIDIA: GB300, GB200, B200, B100, H200, H100, A100.
- AMD: MI300X, MI325, MI350, MI355X (via ROCm).
Latest updates
- [2026/02] Complete argument reference. CLI Reference
- [2026/01] INT4 W4A16 QAT. INT4 Quantization-Aware Training
- [2026/01] Unified VLM/LLM multi-turn rollout. Multi-Agent Co-Evolution
- [2025/12] Rollout Routing Replay (R3) for MoE. Rollout Routing Replay (R3)
- [2025/11] Unified FP8 pipeline generally available. FP8 and Low Precision
- [2025/11] Speculative decoding with online MTP-SFT. Speculative Decoding
Start here
- Installation — Docker, bare metal, AMD.
- Quick Start — a working training run in under an hour.
- Core concepts — the four objects in every Miles job.
- Training backend — Megatron-LM, parallelism, checkpoints, and hooks.
- Training script walkthrough — every argument group in a launch script, annotated.
Contribute
- GitHub: github.com/radixark/miles
- Slack: slack.sglang.ai, channel
#miles - Contributing: developer guide

