Shamim Shams Search

Understanding AI Hallucinations and How to Mitigate Them in Production

· 7 min read
Understanding AI Hallucinations and How to Mitigate Them in Production

Your AI model will fabricate things. Not sometimes — regularly. And it will do it with confidence.

The question isn't whether to trust it. The question is how to build a system that works even when it lies.

What This Covers

What hallucinations are, why they happen at a mechanical level, and six practical mitigation techniques you can apply in production today. Code examples use the Anthropic Python SDK and claude-sonnet-4-6.

Prerequisites

  • Working knowledge of LLM APIs — calling the API, reading responses
  • Basic Python
  • Some production use case in mind — this is more useful when you have something specific to protect

What Actually Causes Hallucinations

LLMs don't look things up. They predict tokens.

Each token is chosen based on statistical probability given all the tokens before it. The model was trained to produce text that looks like the training data. Fluent, coherent, grammatically correct. Accurate is a separate property, and it wasn't what the training objective directly optimized for.

When a model doesn't have strong signal in its training data for a specific fact, it doesn't pause and say it's not sure. It generates something plausible. Names, dates, URLs, API method signatures, academic citations — all of these can be fabricated convincingly because the model has learned the shape of the answer without necessarily knowing the content.

This is not a bug in any specific model. It's a property of how these systems work.

The Common Types

The most common is factual hallucination: the model states something false with full confidence. "The Python requests library supports async operations natively" — it doesn't. The model has seen enough related content to produce a confident-sounding wrong claim.

Source hallucinations are particularly dangerous for applications where traceability matters. Ask the model to cite sources and it may return paper titles, authors, and DOIs for papers that don't exist. The references look real until you check.

Code hallucinations are the most immediately frustrating: methods that don't exist, parameters in the wrong order, deprecated syntax stated as current. I've seen Claude use anthropic.Client() initialization syntax from several months ago that no longer works in the current SDK. Confident, wrong, compiles fine until it runs.

Instruction hallucinations are the hardest to catch. The model reports having followed an instruction it didn't follow, or produces output that looks compliant but subtly isn't. These look like success until you read carefully.

Mitigation Strategies That Work

1. Ground the Model with Context (RAG)

The most effective mitigation for factual hallucinations is Retrieval-Augmented Generation. You provide the relevant facts directly in the prompt instead of relying on what the model learned during training.

import anthropic

client = anthropic.Anthropic()

def answer_with_context(question: str, retrieved_docs: list[str]) -> str:
    context = "\n\n".join(retrieved_docs)

    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Answer the question using ONLY the information provided below.
If the answer is not in the provided information, say "I don't have enough information to answer this."

Information:
{context}

Question: {question}"""
            }
        ]
    )

    return response.content[0].text

The phrase "ONLY the information provided" does the heavy lifting. It's not magic, and it doesn't eliminate hallucination entirely. But it cuts fabrication sharply when the relevant information is in the context — which is what you control.

2. Constrain the Output Format

Free-form generation gives the model room to invent. Structured output is harder to hallucinate into — the model knows what fields to fill, and missing values get null instead of invented ones.

import anthropic
import json

client = anthropic.Anthropic()

def extract_structured_data(text: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""Extract the following fields from the text below.
Return valid JSON only. Use null for any field not present in the text.

Required fields:
- name (string)
- email (string)
- company (string)
- phone (string)

Text: {text}"""
            }
        ]
    )

    return json.loads(response.content[0].text)

json.loads() tells you if the format broke. It doesn't tell you if the values are correct. Add schema validation — Pydantic works well here — for anything downstream that depends on the data.

3. Require Citations

For applications that need traceable output, build citation requirements into the system prompt:

SYSTEM_PROMPT = """You are a research assistant. When answering questions:
1. Only state facts that appear in the provided documents
2. After each factual claim, include a citation: [Doc N, paragraph M]
3. If you cannot find support for a claim in the documents, say so explicitly
4. Never infer facts not directly stated"""

This makes hallucination detectable rather than invisible. If a citation points to a paragraph that doesn't say what the model claims, you catch it. It also slows the model down in a useful way — tracing claims back to sources takes more care than generating free text.

4. Set Temperature to 0 for Factual Tasks

Lower temperature means less randomness in token selection. For extraction, summarization, and Q&A — anything where accuracy matters more than variety — use temperature 0.

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    temperature=0,  # Deterministic for factual tasks
    messages=[
        {
            "role": "user",
            "content": "Extract the invoice total from this text: ..."
        }
    ]
)

Temperature 0 doesn't eliminate hallucination. The model's knowledge is still limited. But it reduces variance — you get consistent behavior instead of random variation across calls, which at least makes testing meaningful.

5. Build a Verification Layer

For high-stakes output, run a second call to verify the first:

import anthropic

client = anthropic.Anthropic()

def generate_and_verify(question: str, context: str) -> dict[str, str]:
    answer_response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"Context: {context}\n\nQuestion: {question}"
            }
        ]
    )
    answer = answer_response.content[0].text

    verify_response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=256,
        messages=[
            {
                "role": "user",
                "content": f"""Does this answer accurately reflect the information in the context?
Reply with SUPPORTED, UNSUPPORTED, or PARTIALLY_SUPPORTED, followed by one sentence of explanation.

Context: {context}

Answer: {answer}"""
            }
        ]
    )

    return {
        "answer": answer,
        "verification": verify_response.content[0].text
    }

This adds latency and roughly doubles token cost for the verified calls. Use it where wrong answers have real consequences: legal summaries, medical context, financial data, anything client-facing you can't quickly review manually.

6. Set Honest Expectations in the System Prompt

Modern Claude models are reasonably good at expressing uncertainty — if you ask them to. Build it in:

SYSTEM_PROMPT = """You are a helpful assistant.
When you're uncertain about a fact, say so.
Phrases like "I believe," "I'm not certain, but," or "you should verify this"
are appropriate when your confidence is low.
It is better to say you don't know than to state something incorrect."""

Test this against questions at the edges of the model's knowledge. You'll find cases where it hedges appropriately, and cases where it doesn't. That tells you where in your application you need the verification layer instead.

What Doesn't Work

Prompts that say "don't hallucinate." There's no flag to flip. "Be accurate" instructions help marginally at best.

Self-review in the same turn. "Check your answer for errors" occasionally catches simple mistakes, but the model reviews its own work with the same weights that produced the errors. It's not a reliable safety net.

Trusting newer models more. Newer models hallucinate less on average. They still hallucinate. The failure mode shifts rather than disappearing. Don't skip validation because you upgraded.

Wrapping Up

Hallucinations are a property of language models, not a defect in any specific one. Every major model hallucinates. The gap between models is real but not zero.

RAG and structured outputs will handle the majority of hallucination problems in most production applications. For anything with real consequences attached — legal, medical, financial, client-facing — add a verification layer and explicit uncertainty language in your system prompt.

Start with those two. You'll cover most of what you'll actually hit in production. The remaining cases mostly require knowing your domain well enough to recognize a wrong answer when you see one. No prompt technique substitutes for that.