Examples - Miles

The catalog
Where to start

The model recipes show you how to train a model. The examples below show you how to build something useful with Miles — tools, search, multi-agent, distillation, and async rollout. Each example follows the same template:

What you’ll learn — the takeaway in one sentence.
Prerequisites — what you need installed/downloaded first.
Files — what’s in the example directory.
Quick start — single command to run.
Walkthrough — annotated tour of the key code.
What’s happening underneath — the moving parts you can’t see.
Tuning knobs — the levers that matter.
Troubleshooting — the failure modes we’ve actually hit.
Variations — common adaptations.

The catalog

Fully Async Rollout

Continuous background generation with a queue between rollout and training. Up to 2× end-to-end speedup.

Search-R1 (Tool Use)

Multi-turn rollout where the model can issue <search>... actions, get observations from a retrieval server, and produce a final answer.

ReTool (Code Execution)

SFT + RL pipeline for tool-augmented reasoning. Sandboxed Python code execution interleaved with thinking.

Multi-Agent Co-Evolution

Two specialized agents (e.g. doctor + patient) train together and improve each other.

Reproducibility Recipe

Bit-stable training across reruns. Determinism flags, seeds, and what to watch.

SFT on OpenHermes

Plain SFT (no RL) — sometimes you just need a quick fine-tune.

Where to start

Never used Miles for anything beyond GRPO? → Fully Async Rollout.
Want tool use / RAG? → Search-R1, then ReTool.
VLM / multi-agent? → Multi-Agent Co-Evolution.
Replay an old result? → Reproducibility Recipe.

Fully Async Rollout

⌘I

Documentation Index

​The catalog

Fully Async Rollout

Search-R1 (Tool Use)

ReTool (Code Execution)

Multi-Agent Co-Evolution

Reproducibility Recipe

SFT on OpenHermes

​Where to start

The catalog

Where to start