Tool call grammar derailment

Hover or click highlighted regions to see what's happening at each point.

Grammar opens string Context confusion Model thinks it exited Re-entry attempt Forced termination
Tool call begins
<channel|><|tool_call>call:write{content:<|"|>The grammar opens a tool call and enters a string field. The <|"|> token is the string delimiter — everything after this is captured as the string value. The grammar will keep consuming tokens until it sees another <|"|>.import re from csvq.types import Query, Operation, Comparison, ParseError def parse(argv: list[str]) -> Query: # 1. Extract flags flags = { "format": "table", "headers": True, "no_headers": False, "count": False, "verbose": False, "limit": None } remaining_args = [] i = 0 while i < len(argv): arg = argv[i] if arg == "--format" and i + 1 < len(argv): ...
~120 lines of valid Python (parser body, omitted for brevity)
else: raise ParseError(f"Operation {kind} not implemented") # Wait, I'll rewrite the while loop to be cleaner. ```The model emits triple backticks, believing it's closing a markdown code fence. But the grammar never opened a code fence — we're still inside the tool call string. These backticks are just three literal characters captured as string content. The model's "where am I?" tracking has diverged from the grammar's state machine.
Model "exits" to reason — but the grammar doesn't care
Actually, let's follow the requirements for Task 3 more carefully. "Implement `parse()` handling the original seven operation keywords, simple (non-compound) `where`, single-key `sort`, and the `--format` flag." The seven are: `select`, `where`, `sort`, `head`, `tail`, `agg`, `distinct`. Let's look at the `Operation` args schema: `groupby` -> `{"columns": ["dept"]}` `agg` -> `{"functions": [("count", "*"), ...]}` If `groupby` is followed by `agg`, it's a fused operation. Let's try this:The model now believes it's in free prose — reasoning about the task, reconsidering its approach, planning a rewrite. It has no idea all of this text is still being captured inside the tool call's string field. From the grammar's perspective, nothing has changed: we're still between the opening <|"|> and a closing one that hasn't arrived yet.
Model re-enters "code mode"
```pythonThe model opens what it thinks is a new code fence. Again, these are just literal characters inside the string. The grammar doesn't parse markdown — it's waiting for one thing only: the closing string delimiter <|"|>. while idx < len(remaining_args): kind = remaining_args[idx] if kind not in keywords: raise ParseError( f"Unknown operation: {kind}") idx += 1 if kind == "groupby": # groupby <cols> agg <funcs> if idx >= len(remaining_args): raise ParseError("groupby requires columns") cols_str = remaining_args[idx] args = {"columns": [c.strip() for c in cols_str.split(",")]} idx += 1 ...The model attempts to write a second, cleaner version of the parser. It's treating this as a fresh code block. But to the grammar, this is just more string content appended to the same tool call that started hundreds of tokens ago.
Grammar trap springs
<channel|><|tool_call>call:write{content:The model emits what it thinks is a new tool call header. But it's still inside the original string field — this text is just more string content. The grammar doesn't recognize nested tool calls inside strings.<|"|>The model tries to start a brand new tool call with its own string delimiter. But the grammar sees this <|"|> and matches it as the closing delimiter of the original string that opened at the very top. The tool call now terminates. Its "content" field contains: the first Python attempt + backticks + all the reasoning prose + backticks + the second Python attempt + the text of a new tool call header. All of it is one malformed string. }<tool_call|>After the string closed, the grammar forces the tool call to terminate. Any remaining tokens the model wanted to generate are either cut off or emitted as malformed trailing content outside any valid structure.
Root cause: the model has no direct access to the grammar's state machine. It tracks context through learned heuristics (code fences, indentation), but the grammar tracks context through a rigid FSM. When these diverge — typically when the model emits a delimiter it thinks is structural but the grammar treats as literal — all subsequent output is misframed.