Advanced Features

This section covers the Miles features that the Core-features section of the homepage points at: low-precision training (FP8 / MXFP8 / INT4 QAT), Rollout Routing Replay for MoE, speculative decoding, and LoRA training and serving.

Low Precision RL

The unified FP8 path: matched quantization between training and inference, BF16 backward and master weights.

INT4 QAT

W4A16 quantization-aware training for fitting large models on a single 8-GPU node.

Rollout Routing Replay (R3)

Capture expert routing during inference and replay during training. The mechanism that keeps MoE RL stable.

Speculative Decoding

Draft + target speculative rollout, with online MTP-SFT for the draft.

LoRA Training and Serving

Train LoRA adapters with SFT or RL and serve them through SGLang from the same checkpoint.

Low Precision RL

⌘I

Documentation Index

Low Precision RL

INT4 QAT

Rollout Routing Replay (R3)

Speculative Decoding

LoRA Training and Serving