Human-in-the-Loop Won’t Save You – Systems Designed for Accuracy and Transparency Will

Human-in-the-Loop Won’t Save You – Systems Designed for Accuracy and Transparency Will

Why the legal industry needs to stop relying on human oversight as a safety net and start demanding better systems. 

Last week, a top-tier law firm submitted a court filing with misquoted statutes, citations to cases that don’t exist, and references to decisions from the wrong jurisdiction. Senior partners reviewed it. AI tools were involved. It went out anyway. And unfortunately, it won’t be the last time. 

When challenged, the firm’s response followed a now-familiar script: reinforce verification policies, review internal training, remind lawyers that AI output needs to be verified. 

But this framing misses the point. 

This wasn’t just a failure of AI. And it wasn’t simply a failure of process. It was a failure of an assumption that human oversight is a reliable safety net. Once that assumption is retired, the legal industry can trust itself again. 

The Verification Myth 

In theory, requiring lawyers to verify AI-generated outputs makes sense. In practice, it breaks down. 

Legal work is time-pressured and cognitively demanding. Verification doesn’t happen in ideal conditions, it happens at the tail end of long workflows, across stacks of documents, against hard deadlines. And the more routine verification becomes, the less reliable it gets. 

This isn’t a criticism of lawyers. It’s a description of how people work under pressure, in any profession. 

The conclusion is straightforward: accuracy cannot be enforced through policy alone. Training people to verify more carefully doesn’t change the conditions under which verification actually happens. At scale, policy and process are no substitutes for system design. 

Accuracy Is Not a Feature. It’s a System Property. 

If oversight is inconsistent by nature, accuracy cannot depend on it. Accuracy has to be built into the system itself, not added as a layer on top. 

But what does accuracy actually mean in this new era of legal AI? It’s no longer about whether a model produces something that sounds plausible. But whether its output is grounded in the right sources, relevant to the specific contract and context, and traceable enough to defend if challenged. 

Accuracy, in that sense, isn’t just about being correct. It’s about being reliable under real working conditions, when the stakes are high, the deadlines are tight, and there’s minimal room to go back and check. Human review is vulnerable to the real word. 

That’s a systems problem. And it needs a systems answer. 

What Most Legal AI Gets Wrong 

The majority of legal AI tools share the same foundational assumption: that generating a confident-sounding answer is the hard part, and that verification can be handled downstream, by the user, outside the system. 

That assumption is the problem. 

When outputs aren’t grounded in the specific contract being reviewed, aren’t connected to authoritative legal sources, and can’t be traced or challenged without leaving the workflow, accuracy becomes the user’s responsibility not the system’s. The tool does the easy part. The lawyer carries the risk. 

That’s not a gap that better training or stricter policies can close. It’s a design choice. And it’s the wrong one. 

Accuracy by Design 

Hallucinations aren’t random. They’re what happens when a system isn’t grounded in the right data, applied in the right context. In most industries, that’s an inconvenience. In legal, it’s catastrophic. 

Designing accuracy into a system requires working at three levels. 

The first is understanding, Contracts are processed both iteratively and holistically. Decomposed clause by clause, provision by provision, so the system deeply understands each stipulation. Contracts are incomplete in isolation. Their interpretation depends on prior agreements, business context, previous legal transactions. This way, the legal and commercial context are fully discovered before an answer is even considered.  

From there, multiple models run in parallel, to generate a response that’s built on the right combination of data. Our Panel of Judges determines the most reliable interpretation.  

On one side, how contracts are actually negotiated: the positions organisations take, the risks they accept, the patterns that emerge over time. On the other, the law itself, case history, statutes, precedent.  

The second is validation, most systems generate an answer and trust it. We don’t. 

Our verifier independently audits the answer, automatically checked, claims examined, citations verified, sources cross-referenced before anything reaches the user. 

Where confidence is low, outputs are flagged rather than presented as fact, the system checks its own homework, so the lawyer isn’t starting from scratch. 

The third is transparency, a lawyer will always want to verify. That instinct is right, and systems should make it as easy as possible.  

Every answer is presented with the systems legal reasoning (from its internal thoughts), and traceable citations to sources from the underlying material. The goal isn’t to replace human judgment. It’s to make human review fast, focused, and grounded. 

Taken together, this approach has design for accuracy at its core, setting the standard. Answers grounded in real contract behaviour, connected to authoritative legal sources, tested before they are shown, and transparent when they are. 

That’s the difference between generating answers and building systems you can actually rely on. 

Human oversight will always matter, but it was never meant to carry the full weight of accuracy. 

We need to design systems that are accurate by design.