Drift, Named

Assiduity AI

Drift, Named

A generative system does not need to break in order to drift. It can remain fluent, coherent, and apparently useful while gradually moving away from the objective that was supposed to govern the task.

That is what makes drift difficult to detect. The failure does not always look like a hallucination. It does not always announce itself with a false citation, a nonsensical sentence, or an obvious contradiction. Often, it appears as a subtle change in emphasis. A threshold becomes a general concern. A binding exception becomes background context. A specific instruction becomes a stylistic preference. The output still reads well. The task has changed.

The previous pieces in this series established the mechanism: weights define a probability landscape, decoding selects a path, and each selected continuation becomes part of the next context. Drift is the failure pattern that emerges when that path remains locally plausible while gradually losing contact with the global objective.

The word matters because it names a distinct failure mode. Drift is not simply an error. It is not merely low quality. It is not the same as randomness, creativity, or verbosity. Drift is cumulative deviation from the governing task across a sequence. The system may be competent at every local step and still move toward an outcome that no longer serves the original purpose.

Return to the board memo on vendor concentration risk. The task is not to produce a polished discussion of supplier risk in general. The task is to preserve three operational facts: the concentration thresholds, the affected accounts, and the escalation triggers requiring committee review. In the opening, the model may do this well. It may identify the relevant exposures and frame the risk correctly. But later, the memo broadens. It adds language about resilience, diversification, and best practices. None of that language is obviously wrong. Some of it may even be useful. But the document’s center of gravity has shifted. The governing objective has become less binding.

This is the signature of drift: local acceptability with global movement.

A single sentence about resilience may not be a problem. A single omitted threshold may be repairable. A single vague summary of an exception may pass unnoticed. But long outputs are sequences, not isolated sentences. Each selection changes the state from which the next selection is made. Once the document begins to describe vendor risk as a general management issue rather than as a threshold-governed escalation problem, later paragraphs are more likely to continue in that broader frame. The output does not fall off a cliff. It follows a slope.

That slope can be hard to see while generation is underway. The model is not moving from right to wrong in one visible step. It is moving from exact to approximate, from operational to thematic, from binding to advisory, from decision rule to discussion. These changes are often rhetorically smooth. They may make the output sound more polished even as they make it less useful.

This is why drift is especially dangerous in serious workflows. The highest-risk failures are not always the most spectacular ones. A compliance summary that invents a rule is easy to challenge if someone notices the invention. A compliance summary that softens an exception into a general concern may be harder to catch. A legal memo that fabricates authority is visibly defective. A legal memo that preserves the tone of caution while losing the operative qualification may look responsible. A risk report that gets the topic right but weakens the threshold may still sound professional. In each case, the output remains plausible enough to travel.

The practical problem is not only that the system can be wrong. It is that the system can be wrong in a way that borrows credibility from fluency.

This distinguishes drift from the common public image of AI failure. Much discussion of generative AI still centers on hallucination: fabricated facts, false citations, invented cases, or confidently wrong claims. Those are real problems. But they are not the whole reliability problem. Hallucination is often a content failure. Drift is a trajectory failure. The model may use accurate facts, cite real documents, follow the requested format, and maintain a professional tone while progressively weakening the objective that mattered most.

That distinction changes how evaluation should work. If the only question is whether the final document is grammatical, relevant, coherent, or professional, drift can pass through review. The better question is whether the sequence remained governed by the same objective over time. Did it preserve the thresholds, exceptions, definitions, and decision rules that mattered at the start? Or did it gradually substitute a more generic version of the task? Drift lives in the gap between surface quality and objective fidelity.

That gap is not always binary. A model does not simply preserve the objective or abandon it. The more common pattern is partial degradation. Some constraints remain. Some details survive. Some sections stay faithful. Others begin to generalize, compress, soften, or omit. This is why drift should be understood as a measurable process rather than a yes-or-no label. The question is how fidelity changes across the generated sequence.

The same logic applies beyond documents. In an agentic workflow, drift can occur as a system moves through actions rather than paragraphs. A research agent may begin with a narrow question and gradually broaden the search until the answer addresses a different problem. A coding agent may begin with a specific bug and end by refactoring adjacent code because that path seems locally useful. A customer-support agent may begin with a policy constraint and gradually steer toward resolution language that conflicts with the rule. In each case, the system may remain active, coherent, and apparently helpful while its behavior moves away from the objective.

This is why drift becomes more important as systems become more capable. Weak systems fail in obvious ways. They misunderstand instructions, produce broken outputs, or quickly reveal their limitations. Stronger systems can travel farther before the divergence becomes visible. They can preserve style, tone, and surface coherence while drifting beneath the surface. The result is no less risky. It is a subtler kind of risk.

Organizations do not adopt generative systems for plausible text in the abstract. They adopt them to perform work under constraints: policy constraints, legal constraints, financial thresholds, operational definitions, procedural rules, and human instructions. If those constraints fade during generation, the system may still produce something that looks like work while no longer doing the work assigned.

This is the point at which reliability must go beyond output inspection. The issue is not only whether the final answer is good enough. The issue is whether the generation process stayed attached to the objective that made the answer useful. That requires observing the relationship between the evolving output and the governing task, not merely judging the final prose after the fact.

Drift, then, is the name for a specific structural problem: the accumulation of locally plausible continuations that gradually reduce fidelity to the global objective. It is smooth enough to evade casual review, cumulative enough to matter in long tasks, and common enough to deserve its own category.

Once drift is named, the next question is why longer tasks expose it more severely. If every selected continuation can change the state from which the next one is generated, then length is not just more output. There is more opportunity for deviation to accumulate. The next article turns directly to that problem: why long tasks break more than short ones.

This is article IV of Losing the Thread: Autoregressive Drift in Generative AI and What Comes Next.
A series on autoregressive drift, objective fidelity, and the emerging control layer in AI.

Assiduity AI

Move Fast. Build Reliable.

Assiduity is building runtime control infrastructure for enterprise AI systems that need to stay aligned, auditable, and reliable during generation.