Fully async rollout splits Miles into two concurrent loops:Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
- A background rollout worker keeps SGLang generation in flight and pushes completed samples into a queue.
- The trainer drains the queue, runs optimizer steps, and syncs updated weights back to rollout engines.
rollout_time + train_time toward max(rollout_time, train_time).
When to use it
| Use fully async when | Stay synchronous when |
|---|---|
| Rollout is a large part of wall time | Debugging a new recipe |
| The run is long enough to amortize queue warm-up | You need the strictest possible on-policy cadence |
| SGLang engines can keep many requests in flight | Queue depth stays at zero even after tuning concurrency |
| You can tolerate slightly older samples in exchange for throughput | You are validating loss math or reward plumbing |
Enable it
Switch the entrypoint fromtrain.py to train_async.py and provide a rollout
function that owns the background worker:
Queue model
The queue is the contract. If it stays populated, the trainer does not wait for generation. If it is empty, rollout is still the bottleneck and async cannot hide it.Tuning knobs
| Knob | What it changes |
|---|---|
--rollout-batch-size | Target amount of work the async producer keeps in flight |
--sglang-server-concurrency | Per-engine request concurrency |
--global-batch-size | Number of samples the trainer drains per step |
--num-steps-per-rollout | Number of optimizer steps per queue drain cycle |
--max-weight-staleness | When the rollout engine’s weight version lags the trainer’s by more than this, the worker recycles the stale group instead of feeding it to the loss |
What to monitor
The reference worker logs progress to stdout, not wandb. Useful lines to grep for:--max-weight-staleness.

