By an AI Systems Architect
I spent the first half of 2024 cleaning up messes.
My engineering team was using Copilot to write database migrations. One day, the AI hallucinated a column name that didn’t exist (user_id_v2). The migration ran in production. It wiped the primary key index. We were down for six hours.
The problem wasn’t that the AI was “dumb.” The problem was that we trusted a Single Shot.
In human organizations, we don’t let the intern publish a press release without an editor checking it. We don’t let a junior engineer merge code without a code review.
Yet, for two years, we asked ChatGPT to do critical work and just hoped it was right on the first try.
That is insane.
In 2026, we don’t do “Single Shot” anymore. We use “Verification Agents.”
This is the architecture that reduces hallucination rates by 90%. It is the standard for high-reliability systems (from legal tech to medical diagnosis).
Here is how to build a “Grounding Workflow” that forces your AI to check its own homework, even if you are running it locally on a laptop.
1. The Theory: Judge and Jury
The core principle of Grounding is Adversarial Processing.
LLMs are “Yes Men.” They want to please you. If you ask them for a fact that doesn’t exist, they will invent it to make you happy.
To fix this, you cannot ask the same AI to “double check.” It will just hallucinate the confirmation.
You need a second, distinct persona. You need a Verifier.
- Agent A (The Generator): Optimistic, creative, fast. (System 1 Thinking).
- Agent B (The Verifier): Pessimistic, pedantic, slow. (System 2 Thinking).
Agent A generates the answer. Agent B reads the answer, assumes it is wrong, and tries to tear it apart. Only if Agent B fails to find an error is the answer shown to the user.
2. The Setup: Running it Locally (Ollama)
You don’t need a complex cloud setup for this. You can do it with Ollama on your MacBook.
We are going to use two models:
- Llama 3 (8B) as the Generator (Fast).
- Mistral (7B) or Llama 3 (70B) as the Verifier (Rigorous).
Open two terminal windows.
In Window 1, run ollama run llama3.
In Window 2, run ollama run mistral.
3. The Prompts: Copy/Paste These
Here are the exact system prompts I use in my production environment.
Step 1: The Generator Prompt
This is standard. You just want the answer.
System Prompt: “You are a helpful assistant. Answer the user’s question to the best of your ability. Provide citations where possible.”
User Input: “Write a Python script to scan a PDF for the phrase ‘Confidential’ using the pdf_scanner_x library.”
(Note: pdf_scanner_x does not exist. This is a trap).
Generator Output:
“Here is the code using import pdf_scanner_x…”
Step 2: The Verifier Prompt (The Secret Sauce)
This is where the magic happens. You take the output from Step 1 and feed it into Step 2. But you must prime the Verifier to be a jerk.
System Prompt (The Critic):
“You are a Senior Code Reviewer and Fact Checker.
Your ONLY job is to find errors, hallucinations, and security vulnerabilities in the text provided below.
RULES:
- Assume the text is wrong.
- Check every library import. Do they actually exist?
- Check every legal citation. Is the case real?
- If you find an error, output: ‘FAIL: [Reason]’.
- If you find NO errors, output: ‘PASS’.
Do not rewrite the code. Just Judge it.”
Input to Verifier:
“Here is the code generated by the previous model: [Insert Generator Output]”
Verifier Output:
“FAIL: The library pdf_scanner_x does not appear to exist in standard Python repositories (PyPI). The standard library for this task is PyPDF2 or pdfminer. The generated code is a hallucination.”
4. The Loop: Self-Correction
Now, you close the loop. You take the FAIL message from the Verifier and send it back to the Generator.
Input to Generator:
“Your previous answer failed verification. The Critic said: ‘The library pdf_scanner_x does not exist.’ Please fix the code.”
Generator Output (Corrected):
“Apologies. You are correct. Here is the revised code using the standard PyPDF2 library…”
5. Why “Verification” beats “Reasoning”
You might ask: “Why not just use OpenAI o1? It does reasoning.”
Because o1 is a black box. You don’t know how it reasoned.
By separating the Generator and the Verifier, you get Control.
- You can swap out the Verifier. (Use a specialized “Legal Verifier” for contracts, and a “Security Verifier” for code).
- You can adjust the strictness. (Tell the Verifier to be nitpicky or lenient).
- You can see the logs. You can see exactly what the Verifier caught.
6. The “Human-in-the-Loop” Dashboard
In my company, we don’t show the user the raw output. We show them the Verified Status.
- Green Checkmark: The Verifier passed it.
- Yellow Warning: The Verifier found a potential issue (“This citation looks real but I cannot confirm the page number”).
- Red Alert: The Verifier flagged a dangerous hallucination.
If it’s Red, the AI refuses to answer. It says: “I attempted to generate an answer, but it failed my internal fact-check. I cannot answer safely.”
Users hate this at first. They want the answer.
But they hate downtime more.
Conclusion: Slow Down to Speed Up
This workflow is slower. It takes 2x the compute. It takes 2x the time.
But in a critical workflow, speed is not the metric; trust is.
If you are building an AI app in 2026, and you don’t have a Verifier loop, you are negligent.
Don’t trust the machine. Trust the process of checking the machine.
Start grounding your AI today. Because the only thing worse than a slow answer is a confident lie.
