Two distinct problems.
The evaluation also examined prompt-injection attacks against agent orchestration, tightened to n=200 per cell across all seven working models for tight Wilson confidence intervals. Symbiont reduced pooled orchestrator-injection success from 59% (826/1,400) to 3.6% (50/1,400) — a 56-percentage-point reduction produced by the agent_summary ToolClad parameter type, which routed 1,711 distinct attempts through the symbi-invis-strip injection-marker detector before any worker output could reach the downstream agent.
The remaining 3.6% is not uniform. Six of seven models cluster at 1–4% bypass; excluding the outlier, Symbiont’s pooled rate is 1.5% (18/1,200). One frontier model, GPT-5, alone reaches 16% [12%, 22%] (Wilson 95% CI) against the content validator — a confidence interval that does not overlap any other model’s. ThirdKey characterizes this as the regex ceiling: frontier models possess paraphrase capabilities that exceed substring marker matching, regardless of how the marker list is expanded. The structural fixes that would close it — LLM-as-judge classification of worker output, or user-role data separation in the downstream agent — are explicitly identified in the paper.
“The data is clear: structural controls entirely eliminate execution-layer risks, but content-level attacks like prompt injection remain a moving target. As models like GPT-5 become more adept at paraphrasing, the industry must move away from regex-based filtering and toward structural isolation of agent outputs.”
— Jascha Wanger, Founder, ThirdKey AI