Shamim Shams Search

Build a Slack Bot That Answers Questions from Your Internal Docs [Part 1]

· 9 min read
Build a Slack Bot That Answers Questions from Your Internal Docs [Part 1]

Build a Slack Bot That Answers Questions from Your Internal Docs

The onboarding doc says the API key lives in the shared vault. The runbook says check the Confluence page. The Confluence page links to a Notion doc that was last edited two years ago. Every new team member goes through this. It's annoying, and it's solvable.

What we're building: a Slack slash command — /ask — that takes a question in plain English and returns an answer grounded in your actual internal documentation. The bot reads the docs, finds the relevant sections, and generates a response using Claude. You'll need Python 3.10+, an Anthropic API key, a Slack app with a slash command configured, and these packages:

pip install anthropic slack-bolt chromadb pypdf2

The Architecture

Three moving parts: ingestion, retrieval, and response generation.

On startup, the bot reads your docs — plain text, Markdown, PDFs — and stores them as vector embeddings in a local ChromaDB collection. When someone types /ask why does the staging deploy fail on Thursdays, the query gets embedded, the most relevant document chunks get retrieved, and Claude uses them as context to write the answer.

It's RAG.

The difference from a generic chatbot is that the document store is your docs. Claude doesn't try to answer from general knowledge. If the answer isn't in the retrieved chunks, it says so — and that constraint is what makes this actually useful in a work context. A bot that confidently fabricates policy details is worse than no bot at all.

For teams with under a few hundred documents, local ChromaDB is plenty. You don't need a hosted vector service, and you don't need to manage infrastructure. The collection lives in memory by default, which means it rebuilds from disk on each start. That's fine for a bot that restarts infrequently.

Load and Chunk the Docs

Start by ingesting the documentation. The chunking strategy matters more than most tutorials acknowledge — I've seen systems fall apart because they split on fixed character counts and cut code blocks in half, or worse, separated a header from the content it introduced.

from pathlib import Path
from anthropic import Anthropic

client = Anthropic()  # reads ANTHROPIC_API_KEY from environment

def load_text_file(path: str) -> str:
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

def chunk_by_section(text: str, max_chunk_size: int = 1500) -> list[str]:
    paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
    chunks: list[str] = []
    current = ""

    for para in paragraphs:
        if len(current) + len(para) > max_chunk_size and current:
            chunks.append(current.strip())
            current = para
        else:
            current = f"{current}\n\n{para}" if current else para

    if current:
        chunks.append(current.strip())

    return chunks

Splitting on double newlines keeps semantic units together — one chunk might be the runbook section on rotating credentials, another the deployment checklist. The max_chunk_size of 1500 characters is a starting point. Dense technical runbooks benefit from smaller chunks; narrative explainers can go larger.

PDFs need a bit more work:

from pypdf import PdfReader

def load_pdf(path: str) -> str:
    reader = PdfReader(path)
    return "\n\n".join(
        page.extract_text() for page in reader.pages if page.extract_text()
    )

The quality of PDF extraction depends on how the PDF was made. Scanned documents give you garbage. If your internal docs are scanned PDFs, you'll need OCR before this — pytesseract is the standard tool, but that's outside scope here.

Store Chunks in ChromaDB

The setup is minimal:

import chromadb

def build_doc_store(docs_dir: str) -> chromadb.Collection:
    db = chromadb.Client()
    collection = db.create_collection("internal_docs")

    docs_path = Path(docs_dir)
    all_chunks: list[str] = []
    ids: list[str] = []

    for i, file_path in enumerate(docs_path.rglob("*.md")):
        text = load_text_file(str(file_path))
        chunks = chunk_by_section(text)
        for j, chunk in enumerate(chunks):
            all_chunks.append(chunk)
            ids.append(f"{file_path.stem}-{i}-{j}")

    for file_path in docs_path.rglob("*.pdf"):
        text = load_pdf(str(file_path))
        chunks = chunk_by_section(text)
        for j, chunk in enumerate(chunks):
            all_chunks.append(chunk)
            ids.append(f"{file_path.stem}-pdf-{j}")

    collection.add(documents=all_chunks, ids=ids)
    return collection

rglob walks subdirectories, so nested folder structures work without extra handling. IDs need to be unique across the entire collection — the {stem}-{i}-{j} pattern handles that as long as no two files share the same stem, which is usually safe for internal doc repos.

chromadb.Client() creates an ephemeral in-memory store. If you want embeddings to persist between restarts without re-indexing every time, use chromadb.PersistentClient(path="./chroma_data") instead. For a small doc collection it doesn't matter — indexing 200 Markdown files takes a few seconds. For larger corpora, persistence saves meaningful startup time.

Wire Up the Slash Command

Slack's Bolt framework handles the OAuth plumbing. The simplest setup for internal use is socket mode — it doesn't require a public URL, which matters if you're running this on a dev machine or a private server behind a firewall.

import os
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler

app = App(token=os.environ["SLACK_BOT_TOKEN"])
collection: chromadb.Collection | None = None

@app.command("/ask")
def handle_ask(ack, say, command):
    ack()
    question = command["text"].strip()

    if not question:
        say("Ask me something. Example: `/ask how do I rotate the staging API key?`")
        return

    answer = answer_question(question, collection)
    say(answer)

The ack() call is mandatory — Slack times out requests after 3 seconds if you don't send an acknowledgment. The Claude call runs after that, so for longer responses you can send a "searching docs..." message immediately and follow up using respond in a background thread. For most internal doc questions, Claude responds within the timeout comfortably, so this simple version works.

You'll need two tokens: SLACK_BOT_TOKEN (the xoxb-... bot token) and SLACK_APP_TOKEN (the xapp-... app-level token with connections:write scope). Both come from your Slack app's configuration page in the API dashboard.

The Answer Gets Generated Here

The retrieved chunks become the context. The user's question becomes the prompt.

def answer_question(question: str, collection: chromadb.Collection) -> str:
    results = collection.query(query_texts=[question], n_results=5)
    context_chunks = results["documents"][0]
    context = "\n\n---\n\n".join(context_chunks)

    message = client.messages.create(
        model="claude-sonnet-4-6",  # or claude-opus-4-8 for longer, more complex docs
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""You are a helpful assistant answering questions about internal company documentation.

Use only the context provided below. If the answer isn't in the context, say so directly — don't guess.

Context:
{context}

Question: {question}"""
            }
        ]
    )

    return message.content[0].text

The "don't guess" instruction matters more than it looks. Without it, Claude will be helpful in the general sense — it'll pull in knowledge about how authentication systems typically work, or how Kubernetes clusters usually behave — and that answer will sound authoritative and be wrong for your specific setup. Grounding the response strictly to retrieved context is the difference between a useful bot and an expensive hallucination machine.

Five retrieved chunks (n_results=5) is a reasonable default. Fewer and you risk missing the relevant section; more and you dilute the context with loosely-related content. If answers seem off, log context_chunks temporarily to see what the retrieval step is actually returning before tweaking the count.

When Does the Bot Get It Wrong?

Retrieval failures are the most common problem, and they're rarely obvious. The bot gives a confident-sounding answer that pulled from the wrong chunk.

One pattern: your docs use different terminology than your team uses in conversation. The runbook says "deployment pipeline" but everyone types /ask how do I push to prod. Those phrases don't match well in embedding space. The quick fix is auditing which questions get bad answers and rewriting the relevant doc sections to use the language your team actually uses — less elegant than query expansion, but more reliable and easier to maintain.

Three-sentence documentation stubs that say "see the Kubernetes guide for details" are useless as retrieved chunks. They surface with high relevance scores but contain nothing actionable. If your docs are full of forward references, you'll either need to expand them inline or accept that those topics won't answer well.

Stale documentation is the worst case. The bot confidently answers with information that was accurate eight months ago. There's no retrieval trick that fixes this — keeping docs current is a process problem. One partial mitigation is storing the last-modified date for each chunk and including it in the context prompt, so Claude can mention when a section hasn't been updated recently.

I'm genuinely not sure that approach is worth the added complexity for smaller teams. It might just teach people to distrust the bot.

Putting It Together

def main():
    global collection
    docs_dir = os.environ.get("DOCS_DIR", "./docs")
    collection = build_doc_store(docs_dir)
    print(f"Loaded docs from {docs_dir}")

    handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
    handler.start()

if __name__ == "__main__":
    main()

Set your environment variables and run:

export ANTHROPIC_API_KEY="your-key"
export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
export DOCS_DIR="./docs"
python bot.py

Invite the bot to a channel with /invite @YourBotName, then try /ask with a real question from your docs. If the answer is wrong, check what chunks were actually retrieved before adjusting anything else.

Wrapping Up

The RAG core is done — chunking, embedding, retrieval, and generation all work. What's missing is the actual Slack wiring: creating the app in the dashboard, getting the two tokens, and handling the 3-second timeout that fires when Claude takes longer than Slack expects.

Part 2 covers exactly that. It walks through creating the Slack app from scratch, fixes the timeout with background threading, replaces the plain-text say(answer) with a formatted Block Kit response that includes source citations, and adds a /reindex command so docs stay current without restarting the process. The code from this article drops in unchanged — Part 2 only touches the Slack layer around it.

>> Part 2