April 29, 2026

The Substitution Problem

Holly Prole

Most discussions of AI failure in enterprise contexts focus on hallucination: the model generating confident, fluent text that is factually wrong. Hallucination is real, and it matters. But there is another failure mode that concerns us and isn’t discussed enough.

That failure mode is one you often cannot detect from the output itself. We call it the substitution problem.

What substitution looks like

Consider a compliance review workflow. A model is given a policy document and tasked with identifying every clause that conflicts with a new regulatory standard. The task is long, structured, and demanding. It requires systematic attention to every part of the document.

In the early output, the model performs precisely. Specific clauses are identified. Conflicts are cited with specificity. The output looks exactly as requested.

Halfway through, something changes. The model begins to generalize. Clause-level findings become section-level summaries. By the final third, the output reads like a qualitative assessment: “the agreement generally addresses the requirements, though several sections may warrant further review.” The language is confident. The structure is clean. The output reads as complete.

But it is not complete. A systematic review has been replaced by a qualitative approximation. Three clauses have not been reviewed. The original objective has been substituted for something that resembles it.

This is not hallucination. The model has not fabricated citations or invented regulatory standards. It has changed the task it was performing partway through, without flagging that change, and produced output that is fluent, coherent, and structurally incomplete.

Why this happens

As a long-horizon generation progresses, the model’s own growing output increasingly dominates the context. The original task specification becomes proportionally less influential. The objective doesn’t disappear; it becomes distant.

This is not a flaw unique to any particular model. It is structural. The mechanism that would maintain objective persistence across extended generation simply does not exist in the standard architecture. Each token is predicted from everything that has come before, including the model’s previous output. Drift is not an aberration; it is what the architecture produces when a task runs long.

Why standard solutions fall short

The two most common responses to objective drift are better prompting and larger models.

Prompting can help at the margins. Chain-of-thought prompting, explicit structural constraints, and objective restatement can reduce drift in shorter tasks. In extended generation, such as document-length outputs, multi-step agent workflows, and long-horizon research synthesis, the benefit degrades. An objective reminder at the top of a prompt becomes proportionally less influential as the context fills with the model’s own output.

Larger models are the more counterintuitive case.

Larger models drift more elegantly, and that makes detection harder. The output looks more complete. The missing coverage is less obvious. Capability and controllability do not scale together.

What a structural solution looks like

The problem is structural, so the solution should be, too. Not improving what the model knows or how it is prompted, but adding a mechanism that maintains a representation of the task objective separately from the generation process, measures deviation as output is produced, and applies correction before the substitution completes.

This is what Assiduity does. We operate at inference time, during generation, before drift becomes failure. The mechanism is model-agnostic and can be implemented alongside existing stacks without retraining the base model.

The distinction that matters for enterprise deployment. The question is not whether a model can complete the task. It almost certainly can. The question is whether it will do so across the full length and scope of the original objective, consistently and without substitution.

That guarantee requires something the model itself cannot provide.

Move Fast
Build Reliable^TM

The Substitution Problem

What substitution looks like

Why this happens

Why standard solutions fall short

What a structural solution looks like

Move Fast. Build Reliable.