Documentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
1. Model Introduction
Qwen3.6 is the next iteration of Alibaba’s Qwen3 line, focused on agentic-coding workflows and on preserving reasoning context across long sessions. The family ships two variants — a sparse MoE (Qwen3.6-35B-A3B) and a dense GDN-backbone model (Qwen3.6-27B) — both with native hybrid reasoning (thinking by default), built-in tool calling, and multimodal text / image / video input. Context windows reach 262 K and extend past 1 M. Weights are Apache 2.0, available in BF16 and FP8. The dense Qwen3.6-27B is the single-GPU-friendly variant. In miles it reuses the Qwen3.5 Megatron spec (miles_plugins.models.qwen3_5.get_qwen3_5_spec); architecturally it’s a
wider, deeper Qwen3.5 with the gated-attention design preserved.
Key highlights:
- Dense GDN backbone: 27 B parameters, single-GPU friendly footprint.
- Attention-output gate: shared with Qwen3.5, trained alongside attention weights.
- Extended rotary base:
--rotary-base 10000000,--rotary-percent 0.25. - Larger vocabulary: 248320 tokens.
- Shape:
hidden-size 5120,ffn-hidden-size 17408, 64 layers. - Long context: 262 K tokens, extensible past 1 M.
2. Supported Variants
| Model | HF ID |
|---|---|
| Qwen3.6-27B | Qwen/Qwen3.6-27B |
3. Environment Setup
3.1 Download model + datasets
3.2 HF → Megatron torch_dist conversion
4. Launch
4.1 Quick start
5. Recipe Configuration
5.1 Parallelism
| TP | PP | CP | EP | max_tokens_per_gpu | SGLang mem-fraction-static | CPU Adam | GPUs |
|---|---|---|---|---|---|---|---|
| 4 | 1 | 1 | 1 | 8192 | 0.5 | ✓ | 8 (1 × 8) |
--sequence-parallel is enabled. Activation checkpointing is on
(--recompute-granularity full --recompute-method uniform --recompute-num-layers 1).
5.2 Algorithm
GRPO with low-variance KL:5.3 Rollout & SGLang
--rollout-num-gpus-per-engine 1 follows the Qwen3.5 line; SGLang TP > 1 has been
problematic on this family. If your SGLang version carries the fix for
sglang#21039, you can raise it.
5.4 Optimizer
CPU Adam is enabled (--optimizer-cpu-offload --overlap-cpu-optimizer-d2h-h2d --use-precision-aware-optimizer).
5.5 Notable quirks
Fromscripts/models/qwen3.6-27B.sh:
--spec miles_plugins.models.qwen3_5 get_qwen3_5_spec— Qwen3.6 reuses the Qwen3.5 spec (gated attention, FP32A_log).--rotary-base 10000000,--rotary-percent 0.25.--vocab-size 248320.--apply-layernorm-1p,--qk-layernorm,--group-query-attention.--attention-output-gate.
A_log through Megatron’s mixed-precision pipeline.

