Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

Miles emits per-rollout metrics to stdout and (optionally) Weights & Biases. SGLang and Ray write their own logs to their default directories.

What gets logged by default

Each rollout iteration emits a structured row to stdout (illustrative shape — exact fields depend on backend and config):
[trainer] iter 12/3000 | loss=0.412 reward=0.61 kl=0.018
                      | rollout=18.4s train=22.1s p2p=2.1s  (total 42.6s)
                      | grad_norm=0.93 lr=1.0e-06
When --use-wandb is set, metrics also go to wandb under the train/, rollout/, and perf/ namespaces (see miles/utils/wandb_utils.py).

Enabling wandb

ray job submit --address=auto -- \
  python3 train.py ... \
    --use-wandb \
    --wandb-project miles \
    --wandb-group qwen3-30b-grpo
Available flags: --use-wandb, --wandb-project, --wandb-group. WANDB_API_KEY should be supplied via Ray’s env_vars rather than baked into the launch script.

What to watch

SignalHealthy patternRed flag
lossSlow decay over hundreds of iterationsSpike → crash within an iteration
raw_rewardTrending up, with healthy varianceSaturates near a single value (collapse)
kl_lossBounded, drifts up over timeSudden jump (policy diverged from ref) — only logged when --use-kl-loss
train_rollout_logprob_abs_diffStable and small (≪ 1.0)Climbing without bound → train/inference precision drift
entropy_lossSlowly decreasingFalls to ~0 too fast (mode collapse)
grad_norm< clip_grad (1.0 by default)Repeatedly hitting clip threshold
rollout_time / train_timeRoughly balancedOne ≫ other → resource imbalance
train/pg_clipfrac< 0.2> 0.5 means policy is moving fast → drop LR
Panel names follow what loss.py and the rollout logger emit; Miles’s wandb metrics live under train/, rollout/, perf/, multi_turn/, passrate/ namespaces.

Custom loggers

Replace the default rollout logger with your own to push to internal systems:
def my_log(rollout_id, args, samples, extra, rollout_time) -> bool:
    statsd.gauge("miles.reward", mean([s.reward for s in samples]))
    return False   # also keep default logging
--custom-rollout-log-function-path my_pkg.logging.my_log

Profiling

ToolWhen
nvidia-smi dmon -s uQuick sanity check on GPU utilization
nsys profileDeep CUDA-level profiling
py-spy dump --pid <ray worker>Find Python-side stalls
ray timelineInspect Ray task scheduling

Built-in PyTorch profiler

The PyTorch profiler is wired into Miles via miles/utils/profile_utils.py. Flags differ by backend: Megatron — choose which sub-loop to profile:
--profile-target train_overall    # or train_actor, train_log_probs (multi-arg)
FSDP — additionally exposes the standard FSDPArgs window:
--use-pytorch-profiler
--profile-step-start 10
--profile-step-end 12
--memory-snapshot-path snapshot.pickle
--tensorboard-dir /data/tb-run-42
Open the trace in chrome://tracing or Perfetto.

Where the log files live

SourcePath
Trainer stdoutwherever you redirected ray job submit (or Ray dashboard)
Ray workers~/.ray/session_latest/logs/
wandb local cachewandb/run-<id>/files/
FSDP profiler / memory snapshot--tensorboard-dir, --memory-snapshot-path

Router endpoints

The router exposes a small FastAPI surface used internally by Miles:
EndpointMethodWhat
/add_workerPOSTRegister an SGLang engine
/list_workersGETList registered workers
any other pathGET / POST / PUT / DELETEProxied to a selected SGLang worker — e.g. /generate, /v1/chat/completions, /health.
Middleware plugins (e.g. radix-tree caching) can mount additional routes — see miles/router/middleware_hub/.