Skip to main content

Documentation Index

Fetch the complete documentation index at: https://www.radixark.com/llms.txt

Use this file to discover all available pages before exploring further.

Miles ships recipes for the DeepSeek family across two generations: DeepSeek-V4 Flash introduces sparse multi-head latent attention with a learned indexer and KV compressors (8-node H200), while V3 / R1 remain the canonical 16-node 671 B-parameter recipes (BF16 train + 128×128 block-wise FP8 rollout, DeepEP, DAPO-style dynamic sampling).

Variants

ModelActive / TotalHF IDRecipe
DeepSeek-V4-Pro49 B / 1.6 TTBAdeepseek-v4-pro
DeepSeek-V4-Flash13 B / 284 Bsgl-project/DeepSeek-V4-Flash-FP8deepseek-v4-flash
DeepSeek-V337 B / 671 Bdeepseek-ai/DeepSeek-V3deepseek
DeepSeek-R137 B / 671 Bdeepseek-ai/DeepSeek-R1deepseek
A validated DeepSeek-V4-Pro recipe is not yet available — see radixark/miles#1046 for tracking.

Fastest path to train

DeepSeek-V4-Flash needs 8 nodes of 8× H200 and the radixark/miles:deepseek-v4 image:
cd /root/miles
python scripts/run_deepseek_v4.py full-train \
   --model-name DeepSeek-V4-Flash-FP8 \
   --num-nodes 8 --num-gpus-per-node 8
DeepSeek-R1 needs 16 nodes of 8× H100:
cd /root/miles
bash scripts/run-deepseek-r1.sh              # full 16-node run
See the DeepSeek-V4 Flash page for the V4 architecture summary, parallelism layouts, and known workarounds; see the DeepSeek R1 / V3 page for the V3 flow — FP8 → BF16 conversion, Megatron parallelism layout (TP8 / PP4 / EP32 / CP4), per-arg walkthrough, and the alternate Python launcher (scripts/run_deepseek.py).

Pairs well with