A Miles training job is a loop over four objects. Once you understand what each one is and how data flows between them, every flag in the system has an obvious home.Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
The four objects
| Object | Role | Lives in |
|---|---|---|
| Prompt dataset | Source of input examples | JSONL on disk (or --data-source-path) |
| Rollout (SGLang engines) | Generates responses given prompts | One or more SGLang servers behind a router |
| Reward model | Maps (prompt, response, label) → score | Built-in (--rm-type) or custom (--custom-rm-path) |
| Actor (Megatron / FSDP) | The model being trained | Megatron torch_dist checkpoint, or HF directory under FSDP |
| Reference | Frozen copy of the actor for KL anchoring | Loaded from --ref-load, never updated |
The training loop
The four-knob invariant
Two knobs govern the sampling half of the loop, two govern the training half, and they are locked into a single equation:Where every flag goes
Use this map when reading any launch script:| Argument group | Concerns |
|---|---|
MODEL_ARGS | Architecture constants (layers, hidden size, rotary base, …) |
CKPT_ARGS | Filesystem paths for the actor / reference / save directory |
ROLLOUT_ARGS | Prompt dataset, batch knobs, sampling parameters, reward type |
EVAL_ARGS | Eval dataset, cadence, sampling overrides for evaluation |
PERF_ARGS | Parallelism (TP/PP/CP/EP/ETP), recomputation, dynamic batching |
GRPO_ARGS | RL algorithm, KL, clipping, entropy bonus, advantage estimator |
OPTIMIZER_ARGS | Learning rate, schedule, weight decay, Adam betas |
SGLANG_ARGS | Engine TP, memory fraction, log level, --sglang-* passthrough |
Next
- Training Backend — Megatron-LM, parallelism, checkpoints, and hooks.
- Argument Groups — where each launch-script array belongs.
- Training Script Walkthrough — the launch script group by group, plus execution modes (colocation, sync/async, dynamic sampling, …).
- CLI Reference — every flag, grouped and fully catalogd.

