Most of Miles’s behavior can be replaced with user-supplied Python by passing aDocumentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
--*-path flag. This page lists every such hook, the function signature it expects,
and the default it replaces.
At a glance
| Stage | Flag | Replaces |
|---|---|---|
| Rollout | --rollout-function-path | The whole rollout loop |
--custom-generate-function-path | A single sample’s generation | |
--data-source-path | How prompts are loaded | |
--eval-function-path | The eval rollout | |
| Reward | --custom-rm-path | Reward computation |
--custom-reward-post-process-path | Reward normalization | |
| Filtering | --dynamic-sampling-filter-path | Per-group filter (DAPO) |
--buffer-filter-path | Buffer dequeue filter | |
--rollout-sample-filter-path | Per-sample loss filter | |
--rollout-all-samples-process-path | Inspect all samples post-rollout | |
--rollout-data-postprocess-path | Mutate samples post-logprob | |
| Training | --custom-loss-function-path | The loss formula |
--custom-tis-function-path | Importance sampling correction | |
--custom-pg-loss-reducer-function-path | Loss reduction (Dr.GRPO) | |
--custom-convert-samples-to-train-data-path | Sample to tensor batch | |
| Megatron hooks | --custom-megatron-init-path | After Megatron init |
--custom-megatron-before-log-prob-hook-path | Before logprob compute | |
--custom-megatron-before-train-step-hook-path | Before each train step | |
| Logging | --custom-rollout-log-function-path | Train-rollout logging |
--custom-eval-rollout-log-function-path | Eval-rollout logging | |
| Routing | --miles-router-middleware-paths | Router middleware |
| Model | --custom-model-provider-path | Megatron model factory |
Rollout
--rollout-function-path
Replace the entire rollout function. Use this only for fundamentally different flows
such as multi-agent co-evolution.
miles.rollout.sglang_rollout.generate_rollout, or
miles.rollout.inference_rollout.inference_rollout_common.InferenceRolloutFn when
enable_experimental_rollout_refactor() is on.
Reference: examples/multi_agent/rollout_with_multi_agents.py.
--custom-generate-function-path
Replace just the generation step inside the default rollout. Most tool-use, RAG, and
multi-turn workflows live here.
examples/search-r1/generate_with_search.py.
--data-source-path
miles.rollout.data_source.RolloutDataSourceWithBuffer.
--eval-function-path
Same signature as --rollout-function-path. Defaults to whatever rollout function is
configured.
Reward
--custom-rm-path
--rm-type options: math, dapo, deepscaler, f1, gpqa,
ifbench, remote_rm (with --rm-url), random.
--custom-reward-post-process-path
Hook to normalize rewards differently from the default GRPO normalization.
Filtering
--dynamic-sampling-filter-path
Per-group filter; runs after scoring, before queueing for training.
miles.rollout.filter_hub.dynamic_sampling_filters.check_reward_nonzero_std.
--buffer-filter-path
Pops samples from the rollout buffer at dequeue time. The default is
pop_first in miles/rollout/data_source.py.
--rollout-sample-filter-path
Per-sample, in-place. Set s.remove_sample = True to exclude a sample from the loss
(advantage normalization still uses it).
The framework passes data: list[list[Sample]] — a list of
n_samples_per_prompt-size groups — so iterate the outer list once to reach Sample
objects:
--rollout-all-samples-process-path
Runs after rollout completes and can see all samples, including filtered ones.
Useful for logging or analysis.
--rollout-data-postprocess-path
Runs after log probabilities have been computed but before training. Useful for
updating loss masks based on per-token logprobs.
Training
--custom-loss-function-path
Replace the GRPO/PPO loss. Requires --loss-type custom_loss. Useful for novel
objectives or multi-objective work.
--custom-tis-function-path
Importance sampling correction for off-policy training when train and inference
diverge.
Reference: examples/train_infer_mismatch_helper/mis.py.
--custom-pg-loss-reducer-function-path
examples/DrGRPO/custom_reducer.py.
--custom-convert-samples-to-train-data-path
Megatron hooks
| Flag | Signature |
|---|---|
--custom-megatron-init-path | def custom_init(args) -> None |
--custom-megatron-before-log-prob-hook-path | def custom_hook(args, model, store_prefix) -> None |
--custom-megatron-before-train-step-hook-path | def custom_hook(args, rollout_id, step_id, model, optimizer, opt_param_scheduler) -> None |
Logging
True to suppress Miles’s default logging, False to layer on top.

