The model recipes show you how to train a model. The examples below show you how to build something useful with Miles — tools, search, multi-agent, distillation, and async rollout. Each example follows the same template:Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
- What you’ll learn — the takeaway in one sentence.
- Prerequisites — what you need installed/downloaded first.
- Files — what’s in the example directory.
- Quick start — single command to run.
- Walkthrough — annotated tour of the key code.
- What’s happening underneath — the moving parts you can’t see.
- Tuning knobs — the levers that matter.
- Troubleshooting — the failure modes we’ve actually hit.
- Variations — common adaptations.
The catalog
Fully Async Rollout
Continuous background generation with a queue between rollout and training.
Up to 2× end-to-end speedup.
Search-R1 (Tool Use)
Multi-turn rollout where the model can issue
<search>... actions, get
observations from a retrieval server, and produce a final answer.ReTool (Code Execution)
SFT + RL pipeline for tool-augmented reasoning. Sandboxed Python code execution
interleaved with thinking.
Multi-Agent Co-Evolution
Two specialized agents (e.g. doctor + patient) train together and improve
each other.
Reproducibility Recipe
Bit-stable training across reruns. Determinism flags, seeds, and what to
watch.
SFT on OpenHermes
Plain SFT (no RL) — sometimes you just need a quick fine-tune.
Where to start
- Never used Miles for anything beyond GRPO? → Fully Async Rollout.
- Want tool use / RAG? → Search-R1, then ReTool.
- VLM / multi-agent? → Multi-Agent Co-Evolution.
- Replay an old result? → Reproducibility Recipe.

