In agentic / multi-turn workflows, Miles uses SGLang’s pretokenized prefix mechanism so the conversation history is not re-tokenized every turn. That requires the chat template to satisfy an append-only invariant: rendering messagesDocumentation Index
Fetch the complete documentation index at: https://www.radixark.com/llms.txt
Use this file to discover all available pages before exploring further.
[1..N] must
produce a string that is an exact prefix of rendering [1..N+1].
Some community templates violate this. They use loop.last or other
context-dependent Jinja logic that flips bits across turns, and the result is silent
tokenization drift, divergent log-probabilities, and gradient blow-up after a few
iterations of multi-turn RL.
Miles ships a verifier and an autofix.
Quick start
Verify a HuggingFace template
Apply Miles’s autofix
If a fixed template ships for that model,--autofix swaps it in and re-runs the
suite:
Verify a local Jinja file
Include thinking-specific cases
For Qwen3.5, GLM-5, and other models that toggleenable_thinking, add --thinking
to also run thinking-specific trajectories.
CLI
| Flag | What |
|---|---|
--template PATH | Local .jinja template. |
--model MODEL_ID | HF model ID. |
--autofix | Apply Miles’s fixed template if available. |
--thinking | Also run thinking-specific cases. |
0 on pass, 1 on fail.
How it works
For each test case (a list of messages), the verifier renders progressive prefixes and checks the invariant character by character:miles/utils/test_utils/chat_template_verify.py. Standard cases cover
single-tool, multi-turn, parallel-tool, and long-chain trajectories; the thinking
suite adds variants that toggle enable_thinking.
A break almost always comes from loop.last, conditional whitespace, or a closing
token that’s only emitted on the final turn.
Using the fixed template at training time
Once you have the right template, point Miles at it:miles/utils/chat_template_utils/templates/ (e.g. qwen3_fixed.jinja,
qwen3.5_fixed.jinja, qwen3_thinking_2507_and_next_fixed.jinja).
What “append-only” buys you
| Without it | With it |
|---|---|
| Re-tokenize everything each turn | Tokenize only the new turn |
| O(N²) tokenization cost | O(N) tokenization cost |
| Subtle drift between turns | Bit-stable tokens |
| Multi-turn RL collapses after ~50 steps | Stable across thousands of steps |

