Evidence for runtime control during generation.
Assiduity treats long-horizon generation as a control problem. A long output is not a single prediction; it is a trajectory that can drift from the task objective over time. Our research evaluates whether that drift can be measured and reduced during generation without retraining the model.
Selected public evidence.
The public summary below is designed to establish the empirical direction without exposing implementation details reserved for controlled technical review.
Controlled evaluations show that runtime selection against a semantic operating contract can reduce drift in long-form generation. Placebo controls further indicate that the effect depends on the semantic content of the contract, not merely on sampling, reranking, or applying an uninformative selection rule.
Long-form generation should be evaluated as a path, not only as an answer.
Standard evaluation often treats a generated document as a final artifact. That misses an important failure pattern: the model can begin on task, gradually drift, and still produce fluent text. The relevant question is not only whether the final output reads well. It is whether the generation path remained aligned with the objective.
Assiduity’s research studies inference-time control: whether an external runtime layer can evaluate candidate continuations against a semantic operating contract and preserve objective fidelity as the output unfolds.
Runtime control has measurable effects.
Drift is measurable
Long outputs can be evaluated against a semantic operating contract over time, producing a trajectory of deviation rather than only a final pass/fail judgment.
Drift can be reduced during generation
In controlled evaluations, candidate continuation selection improved objective retention relative to greedy baseline generation.
Semantic specificity matters
Placebo-style controls reduce the observed effect when the contract no longer carries task-relevant semantic content.
Model-agnostic behavior is plausible
Results across multiple model families and corpora indicate that the control layer is not merely tuned to one model or one benchmark.
Stronger models still have headroom
Larger models may begin from a better baseline, but long-horizon objective retention can still benefit from runtime control.
Sparse intervention supports practicality
Control does not need to imply maximum branching at every step. Candidate evaluation can be concentrated where drift risk is higher.
Why placebo tests matter.
A control method is not credible if it only appears to work because it samples more text, applies a generic reranking rule, or benefits from an uninformative selection process. Placebo tests help separate genuine semantic control from mechanical selection effects.
In Assiduity’s evaluations, replacing the task-relevant operating contract with less informative substitutes materially reduces the observed benefit. That result supports the core claim: the control layer is responding to semantic contract content, not simply exploiting randomness or a generic scoring preference.
The same signal used for control can support review.
Runtime control produces evidence about the path of generation. The ε trajectory records how output behaved relative to the semantic contract over time. That gives reviewers a structured artifact for understanding whether the system stayed aligned while it worked.
Path-level evidence
Reviewers can inspect behavior across the generation path rather than relying only on a final answer.
Constraint visibility
The system records whether required concepts, prohibited terms, and task constraints were respected.
Run-level records
Each controlled run can preserve summary status, branching behavior, stop reasons, and telemetry signals.
Governance bridge
The research connects directly to enterprise review, monitoring, audit support, and model-governance workflows.
Continuing evaluation across domains and models.
Current research has focused on long-form summarization and objective retention across government reports, scientific papers, and large-model settings. Ongoing validation is extending the same control framework to additional model providers and task families.
Assiduity is also summarizing results across OpenAI and Claude models on additional evaluation sets, including biomedical literature and earnings-call material. Those results should be published only after they are validated and reconciled with the existing research record.
Public evidence without public implementation disclosure.
Assiduity is publishing selected research findings while keeping implementation details, benchmark tables, parameter choices, and patent-sensitive methods under controlled review. This protects the invention while allowing serious reviewers to evaluate the empirical basis for runtime control.
Public summary
High-level findings, proof signals, research thesis, and selected interpretation are suitable for public pages.
Controlled technical review
Full tables, methods, additional controls, implementation details, and evaluation artifacts are available through selected review conversations.
Review the evidence behind runtime control.
Assiduity provides additional research summaries, controlled evaluation materials, and protected demo access for qualified reviewers.