Research

Evidence for runtime control during generation.

Assiduity treats long-horizon generation as a control problem. A long output is not a single prediction; it is a trajectory that can drift from the task objective over time. Our research evaluates whether that drift can be measured and reduced during generation without retraining the model.

Proof signals

Selected public evidence.

The public summary below is designed to establish the empirical direction without exposing implementation details reserved for controlled technical review.

199 / 200 documents improved in primary evaluation
d = 1.64 effect size versus greedy baseline
3 model families · 2 corpora tested without model-specific tuning

Controlled evaluations show that runtime selection against a semantic operating contract can reduce drift in long-form generation. Placebo controls further indicate that the effect depends on the semantic content of the contract, not merely on sampling, reranking, or applying an uninformative selection rule.

Research thesis

Long-form generation should be evaluated as a path, not only as an answer.

Standard evaluation often treats a generated document as a final artifact. That misses an important failure pattern: the model can begin on task, gradually drift, and still produce fluent text. The relevant question is not only whether the final output reads well. It is whether the generation path remained aligned with the objective.

Assiduity’s research studies inference-time control: whether an external runtime layer can evaluate candidate continuations against a semantic operating contract and preserve objective fidelity as the output unfolds.

What the evidence shows

Runtime control has measurable effects.

Drift is measurable

Long outputs can be evaluated against a semantic operating contract over time, producing a trajectory of deviation rather than only a final pass/fail judgment.

Drift can be reduced during generation

In controlled evaluations, candidate continuation selection improved objective retention relative to greedy baseline generation.

Semantic specificity matters

Placebo-style controls reduce the observed effect when the contract no longer carries task-relevant semantic content.

Model-agnostic behavior is plausible

Results across multiple model families and corpora indicate that the control layer is not merely tuned to one model or one benchmark.

Stronger models still have headroom

Larger models may begin from a better baseline, but long-horizon objective retention can still benefit from runtime control.

Sparse intervention supports practicality

Control does not need to imply maximum branching at every step. Candidate evaluation can be concentrated where drift risk is higher.

Controls

Why placebo tests matter.

A control method is not credible if it only appears to work because it samples more text, applies a generic reranking rule, or benefits from an uninformative selection process. Placebo tests help separate genuine semantic control from mechanical selection effects.

In Assiduity’s evaluations, replacing the task-relevant operating contract with less informative substitutes materially reduces the observed benefit. That result supports the core claim: the control layer is responding to semantic contract content, not simply exploiting randomness or a generic scoring preference.

Governance telemetry

The same signal used for control can support review.

Runtime control produces evidence about the path of generation. The ε trajectory records how output behaved relative to the semantic contract over time. That gives reviewers a structured artifact for understanding whether the system stayed aligned while it worked.

Path-level evidence

Reviewers can inspect behavior across the generation path rather than relying only on a final answer.

Constraint visibility

The system records whether required concepts, prohibited terms, and task constraints were respected.

Run-level records

Each controlled run can preserve summary status, branching behavior, stop reasons, and telemetry signals.

Governance bridge

The research connects directly to enterprise review, monitoring, audit support, and model-governance workflows.

Validation path

Continuing evaluation across domains and models.

Current research has focused on long-form summarization and objective retention across government reports, scientific papers, and large-model settings. Ongoing validation is extending the same control framework to additional model providers and task families.

Assiduity is also summarizing results across OpenAI and Claude models on additional evaluation sets, including biomedical literature and earnings-call material. Those results should be published only after they are validated and reconciled with the existing research record.

What remains controlled

Public evidence without public implementation disclosure.

Assiduity is publishing selected research findings while keeping implementation details, benchmark tables, parameter choices, and patent-sensitive methods under controlled review. This protects the invention while allowing serious reviewers to evaluate the empirical basis for runtime control.

Public summary

High-level findings, proof signals, research thesis, and selected interpretation are suitable for public pages.

Controlled technical review

Full tables, methods, additional controls, implementation details, and evaluation artifacts are available through selected review conversations.

Technical review

Review the evidence behind runtime control.

Assiduity provides additional research summaries, controlled evaluation materials, and protected demo access for qualified reviewers.