Miles supports NVIDIA’s Nemotron-3 line: a Mamba + Attention hybrid that, in the Super tier, adds MoE and ships natively in FP8. All three variants load via the Megatron AutoBridge path, so there is no offline HF →Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
torch_dist conversion step.
Variants
| Model | Active / Total | HF ID | Recipe |
|---|---|---|---|
| Nemotron-3-Nano | 4 B / 4 B (dense) | nvidia/Nemotron-3-Nano-4B | nemotron-3-nano |
| Nemotron-3-Nano MoE | 3 B / 30 B | nvidia/Nemotron-3-Nano-30B-A3B | nemotron-3-nano-moe |
| Nemotron-3-Super | 12 B / 120 B (FP8) | nvidia/Nemotron-3-Super-120B-A12B-FP8 | nemotron-3-super |
Fastest path to train
Nemotron-3-Nano (dense, 4 B) is the smallest and runs on a single 8-GPU node:Which variant do I pick?
- Smallest, single-node smoke test → Nemotron-3-Nano (nemotron-3-nano).
- Mid-scale hybrid MoE → Nemotron-3-Nano MoE (nemotron-3-nano-moe).
- Frontier-scale FP8-native MoE → Nemotron-3-Super (nemotron-3-super).
Pairs well with
- Backends Beyond Megatron — the AutoBridge path Nemotron rides on.
- Low Precision RL — Super ships natively in FP8.

