Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

A reading guide before you start patching.

The processes

A Miles run is three kinds of processes wrapped in a Ray cluster:
  • Trainer ranks — Megatron processes that load torch_dist checkpoints and run the RL loop.
  • SGLang servers — independent HTTP services that produce rollouts.
  • Miles Router — FastAPI proxy that distributes rollout requests, preserves metadata (R3), and enforces health checks.
  • Data Source — Python object owned by the trainer; reads prompt JSONL and acts as a buffer between rollout and training.

The package layout

miles/
├── backends/
│   ├── megatron_utils/   # fp32 markers, optimizer offload helpers, weight sync
│   ├── sglang_utils/     # SGLang glue
│   ├── training_utils/   # loss / GRPO / PPO / GSPO / REINFORCE++ plumbing
│   └── experimental/
│       └── fsdp_utils/   # FSDP-flavoured trainer (in progress)
├── ray/                  # Ray actors + rollout driver
├── rollout/
│   ├── sglang_rollout.py # default rollout function
│   ├── data_source.py    # buffer + JSONL loader
│   ├── filter_hub/       # built-in filters
│   └── inference_rollout/# experimental refactor
├── router/               # FastAPI proxy + middleware engine (router.py)
└── utils/                # async, types, IO, distributed helpers, arguments.py
train.py and train_async.py are the two entry points. They’re thin: ~200 lines each. Most logic lives in the modules above.

A request’s life

For a single GRPO iteration: This is the sync path. Async (train_async.py + --rollout-function-path fully_async_rollout.generate_rollout_fully_async) breaks the request from the trainer loop and uses a continuously-running worker.

Where common changes go

You want to …Edit
Add a new RL algorithmmiles/backends/training_utils/loss.py + enum in miles/utils/arguments.py
Add a new built-in reward typemiles/rollout/sglang_rollout.py (rm dispatch)
Add a new built-in filtermiles/rollout/filter_hub/
Wrap a new model architecturemiles_plugins/models/<model>.py + mbridge
Add a new flagmiles/utils/arguments.py
Change weight syncmiles/backends/megatron_utils/update_weight/ and miles/utils/distributed_utils.py
Change rollout buffermiles/rollout/data_source.py

Extension points (the right way)

The trainer is plug-in-friendly. Most extensions don’t need a code change inside Miles — just pass a --something-path my_pkg.thing. See Customization for the full list. If you find yourself patching the trainer to make something work, that’s a sign we’re missing a hook. Open an issue.

Tests

tests/
├── fast/             # fast, no GPU
├── ci/               # CI-gated suite
├── e2e/              # end-to-end (spins up Ray + SGLang)
└── utils/            # shared test helpers
Run pytest tests/fast for a quick check; run tests/e2e before landing anything that touches the train loop.

Where to look first when reading the code

If you have 30 minutes and want to understand Miles end-to-end:
  1. train.py — the loop, top-to-bottom.
  2. miles/rollout/sglang_rollout.py:generate_rollout — how prompts become samples.
  3. miles/backends/training_utils/loss.py — the loss and advantage computation.
  4. miles/router/router.py — the FastAPI proxy.
  5. miles/utils/distributed_utils.py — weight sync.
That’s the spine. Everything else hangs off it.