GLM4

1. Model Introduction

GLM-Z1-9B-0414 is a dense reasoning-tuned model from Zhipu AI’s GLM-4 series, sized for single-node experimentation. Key highlights:

Dense 9 B architecture: fits comfortably on a single 8-GPU node.
Reasoning-tuned: post-trained for step-by-step reasoning under the GLM-Z1 line.
Compatible RL recipe: GRPO with DAPO-style rollout, drop-in replacement for other dense Qwen / LLaMA-class workloads.

2. Supported Variants

Model	HF ID
GLM-Z1-9B-0414	zai-org/GLM-Z1-9B-0414

3. Environment Setup

3.1 Download model + datasets

hf download zai-org/GLM-Z1-9B-0414 --local-dir /root/GLM-Z1-9B-0414
hf download --repo-type dataset zhuzilin/dapo-math-17k --local-dir /root/dapo-math-17k
hf download --repo-type dataset zhuzilin/aime-2024     --local-dir /root/aime-2024

3.2 HF → Megatron `torch_dist` conversion

cd /root/miles
source scripts/models/glm4-9B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
   ${MODEL_ARGS[@]} \
   --hf-checkpoint /root/GLM-Z1-9B-0414 \
   --save          /root/GLM-Z1-9B-0414_torch_dist

4. Launch

4.1 Quick start

cd /root/miles
bash scripts/run-glm4-9B.sh                    # 8 GPU
bash scripts/run-glm4-9B-4xgpu-radixtree.sh    # 4 GPU smoke test

5. Recipe Configuration

5.1 Parallelism

Script	TP	PP	CP	EP	`max_tokens_per_gpu`	actor / rollout GPUs	GPUs
`run-glm4-9B.sh`	2	1	2	1	4608	4 / 4 (non-colocate)	8 (1 × 8)
`run-glm4-9B-4xgpu-radixtree.sh`	2	1	1	1	2304	4 / 2	4 (1 × 4)

5.2 Algorithm

GRPO across both scripts:

GRPO_ARGS=(
   --advantage-estimator grpo
   --use-kl-loss
   --kl-loss-coef 0.00
   --eps-clip 0.2
   --eps-clip-high 0.28
)

5.3 Rollout & SGLang

# run-glm4-9B.sh
SGLANG_ARGS=(
   --rollout-num-gpus-per-engine 2
)

# run-glm4-9B-4xgpu-radixtree.sh
SGLANG_ARGS=(
   --rollout-num-gpus-per-engine 2
)

1. Model Introduction

2. Supported Variants

3. Environment Setup

3.1 Download model + datasets

3.2 HF → Megatron `torch_dist` conversion

4. Launch

4.1 Quick start

5. Recipe Configuration

5.1 Parallelism

5.2 Algorithm

5.3 Rollout & SGLang

5.4 Optimizer

5.5 Notable quirks

6. Pairs Well With

Documentation Index

​1. Model Introduction

​2. Supported Variants

​3. Environment Setup

​3.1 Download model + datasets

​3.2 HF → Megatron torch_dist conversion

​4. Launch

​4.1 Quick start

​5. Recipe Configuration

​5.1 Parallelism

​5.2 Algorithm

​5.3 Rollout & SGLang

​5.4 Optimizer

​5.5 Notable quirks

​6. Pairs Well With

1. Model Introduction

2. Supported Variants

3. Environment Setup

3.1 Download model + datasets

3.2 HF → Megatron `torch_dist` conversion

4. Launch

4.1 Quick start

5. Recipe Configuration

5.1 Parallelism

5.2 Algorithm

5.3 Rollout & SGLang

5.4 Optimizer

5.5 Notable quirks

6. Pairs Well With