Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

Miles launch scripts are bash arrays. The grouping is deliberately boring: each array owns one operational concern, then the script expands all arrays into train.py or train_async.py. Use this page to decide where a flag belongs. Use the CLI Reference when you need the full default and type for an individual flag.
GroupOwnsTypical source
MODEL_ARGSArchitecture constants and plugin specsscripts/models/<family>.sh
CKPT_ARGSActor, reference, HF tokenizer/config, save pathsLaunch script
ROLLOUT_ARGSPrompt data, sampling, reward, train/eval batch flowLaunch script
EVAL_ARGSEvaluation datasets and eval-only sampling overridesLaunch script
PERF_ARGSParallelism, recomputation, dynamic batchingRecipe defaults
GRPO_ARGSRL objective, KL, clipping, entropy, advantage estimatorRecipe defaults
OPTIMIZER_ARGSLearning rate, schedule, weight decay, Adam betasRecipe defaults
SGLANG_ARGSRollout engine topology and --sglang-* passthroughDeployment shape

MODEL_ARGS - architecture constants

MODEL_ARGS tells Megatron what model it is instantiating. Megatron cannot infer all architecture details from a HuggingFace checkpoint, so each recipe sources a matching file from scripts/models/. Common entries:
Flag familyExample
Transformer shape--num-layers, --hidden-size, --num-attention-heads
Tokenizer/model dimensions--seq-length, --max-position-embeddings, --vocab-size
Rotary and attention variants--rotary-base, --rotary-percent, --kv-channels
MoE architecture--num-experts, --moe-router-topk, --moe-grouped-gemm
Plugin specs--spec miles_plugins.models.qwen3_5 get_qwen3_5_spec
Keep these values aligned with the checkpoint’s config.json. If one checkpoint in a family changes rotary base, vocab padding, or normalization epsilon, override the sourced defaults in the launch script.

CKPT_ARGS - checkpoint paths

CKPT_ARGS wires the three model roles in a run:
RoleFlag
HuggingFace directory for tokenizer, config, and SGLang boot--hf-checkpoint
Frozen reference model for KL anchoring--ref-load
Actor resume point--load
Actor output directory--save
--load and --save usually point to the same directory. If --load has no latest_checkpointed_iteration.txt, Miles warm-starts the actor from --ref-load.

ROLLOUT_ARGS - sampling and reward

ROLLOUT_ARGS controls data entering the loop and how many samples each rollout produces.
ConcernFlags
Prompt data--prompt-data, --input-key, --label-key, --apply-chat-template
Rollout volume--rollout-batch-size, --n-samples-per-prompt, --num-rollout
Training consumption--global-batch-size, --num-steps-per-rollout
Sampling--rollout-temperature, --rollout-top-p, --rollout-max-response-len
Reward--rm-type, --custom-rm-path
Filtering--over-sampling-batch-size, --dynamic-sampling-filter-path
The rollout volume and training consumption must satisfy the four-knob invariant.

EVAL_ARGS - evaluation overrides

Evaluation reuses the rollout stack but usually runs with a different dataset and more deterministic sampling. Common entries:
ConcernFlags
Cadence--eval-interval
Dataset--eval-prompt-data
Eval group size--n-samples-per-eval-prompt
Eval-only generation--eval-max-response-len, --eval-top-p, --eval-temperature
Flags not set in EVAL_ARGS inherit from ROLLOUT_ARGS.

PERF_ARGS - parallelism and memory

PERF_ARGS controls how training is sharded and how activation memory is managed.
ConcernFlags
Tensor parallelism--tensor-model-parallel-size, --sequence-parallel
Pipeline parallelism--pipeline-model-parallel-size
Context parallelism--context-parallel-size
Expert parallelism--expert-model-parallel-size, --expert-tensor-parallel-size
Recomputation--recompute-granularity, --recompute-method, --recompute-num-layers
Dynamic batching--use-dynamic-batch-size, --max-tokens-per-gpu
Megatron exposes TP, PP, CP, EP, and ETP, but not every product of those dimensions is valid or worth using for every model. Start from the recipe’s tested combination and see parallelism compatibility before changing more than one dimension.

GRPO_ARGS - RL objective

GRPO_ARGS controls the policy-gradient objective and the stability terms around it.
ConcernFlags
Algorithm--advantage-estimator
KL--use-kl-loss, --kl-loss-coef, --kl-loss-type
Clipping--eps-clip, --eps-clip-high
Entropy--entropy-coef
Loss reduction--calculate-per-token-loss
Precision/off-policy safety--use-tis
Zero-weight KL is recipe-specific. --use-kl-loss --kl-loss-coef 0.00 still loads the reference and logs KL; it does not remove the reference model.

OPTIMIZER_ARGS - optimizer schedule

OPTIMIZER_ARGS carries the optimizer choice and scalar schedule. Common entries:
ConcernFlags
Optimizer--optimizer
Learning rate--lr, --min-lr, --lr-decay-style
Adam--adam-beta1, --adam-beta2, --adam-eps
Regularization--weight-decay, --clip-grad
Post-training is sensitive to large updates. Most recipes start near 1e-6 and use a constant schedule unless the model page says otherwise.

SGLANG_ARGS - rollout engine passthrough

SGLANG_ARGS configures the inference side. Miles owns --rollout-num-gpus-per-engine; everything prefixed with --sglang- is forwarded to python -m sglang.launch_server after removing the prefix. Common entries:
ConcernFlags
Engine tensor parallelism--rollout-num-gpus-per-engine
Engine memory--sglang-mem-fraction-static
Context length--sglang-context-length
MoE serving--sglang-enable-ep-moe, --sglang-enable-dp-attention
Debugging--sglang-log-level
SGLang parallelism is separate from trainer parallelism. For example, --rollout-num-gpus-per-engine maps to the SGLang server’s TP size, not Megatron’s --tensor-model-parallel-size.