Slack Bot for Internal Docs, Part 2: Real Slack Integration

If you ran the bot from Part 1, you probably hit the timeout within the first few minutes. Someone typed /ask, got "This app didn't respond in time" in red, and trust died a little. Claude was still thinking.

Part 1 treated the Slack app as a prerequisite — configured and ready. This part actually builds it. We'll create the app in the Slack dashboard, wire both tokens, fix the timeout with background threading, format answers with Block Kit so they look like a real bot response instead of a text dump, and add a /reindex command so the docs stay current without restarting the process. You'll need the code from Part 1 and a Slack workspace where you have permission to install apps.

Create the Slack App

Head to api.slack.com/apps and click Create New App → From Scratch. Name it something obvious — DocBot, AskBot, whatever your team will recognize — and select your workspace.

Three things to configure after the app is created:

Socket Mode. Under Settings → Socket Mode, toggle it on. Generate an App-Level Token with connections:write scope. Copy it — it starts with xapp-. This is SLACK_APP_TOKEN.

Slash Command. Under Features → Slash Commands → Create New Command: set the command to /ask, the request URL to any valid URL (socket mode ignores it entirely — the field is just required by the form), and add a short description like "Ask a question from internal docs."

Bot Token Scopes. Under OAuth & Permissions → Scopes → Bot Token Scopes, add commands and chat:write. Then click Install to Workspace, confirm the permissions, and copy the Bot User OAuth Token — it starts with xoxb-. This is SLACK_BOT_TOKEN.

export SLACK_BOT_TOKEN="xoxb-..."
export SLACK_APP_TOKEN="xapp-..."
export ANTHROPIC_API_KEY="your-key"
export DOCS_DIR="./docs"

Invite the bot to your target channel with /invite @DocBot once it's running.

Why Does the 3-Second Timeout Fire?

Slack's rule with slash commands: your server must call ack() within 3 seconds. Miss the window and the user gets an error. The Part 1 code called ack() first — that's correct — but the respond or say call after it sometimes exceeded the window because the ChromaDB query and the Claude API call ran synchronously before posting anything back.

It doesn't fail silently.

The user sees a visible Slack error message, not a loading state. I've watched people try a bot three times after seeing that error, decide it doesn't work, and go back to searching Confluence manually. Once that happens, you've lost them.

The fix is threading: acknowledge immediately, post an ephemeral "searching..." message right away, then run the slow work in a background thread and post the real answer when it's ready.

import os
import threading
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
from anthropic import Anthropic, APIError
import chromadb

client = Anthropic()  # reads ANTHROPIC_API_KEY from environment
app = App(token=os.environ["SLACK_BOT_TOKEN"])
collection: chromadb.Collection | None = None

@app.command("/ask")
def handle_ask(ack, respond, command):
    ack()
    question = command["text"].strip()

    if not question:
        respond(
            text="Ask me something. Example: `/ask how do I rotate the staging API key?`",
            response_type="ephemeral"
        )
        return

    respond(text=":mag: Searching docs...", response_type="ephemeral")

    def do_search():
        try:
            answer, sources = answer_question(question, collection)
            blocks = format_answer_blocks(question, answer, sources)
            respond(blocks=blocks, response_type="in_channel")
        except APIError as e:
            respond(
                text=f":warning: API error — try again in a moment.\n`{e.message}`",
                response_type="ephemeral"
            )
        except Exception:
            respond(
                text=":warning: Something went wrong. Check the bot logs.",
                response_type="ephemeral"
            )

    threading.Thread(target=do_search, daemon=True).start()

respond is Bolt's callable for slash command responses — it works both for the immediate reply and for follow-up messages after ack(). That's why we use it instead of say here. The "searching..." message uses response_type="ephemeral" so only the person who ran the command sees it. The final answer posts in_channel so the whole team benefits.

Format Answers with Block Kit

say(answer) or respond(text=answer) posts raw text. Slack renders markdown differently in plain messages — code blocks come through as inline backticks, lists flatten, headers disappear. Claude's formatted response turns into a clunky wall of text.

Block Kit is the fix. A section block with mrkdwn: true renders Claude's markdown correctly, and a context block at the bottom shows which doc files the answer came from — which matters more than it sounds.

def format_answer_blocks(question: str, answer: str, sources: list[str]) -> list[dict]:
    blocks = [
        {
            "type": "section",
            "text": {
                "type": "mrkdwn",
                "text": f"*Q: {question}*\n\n{answer}"
            }
        },
        {"type": "divider"}
    ]

    if sources:
        source_text = "  |  ".join(f"`{s}`" for s in sources[:5])
        blocks.append({
            "type": "context",
            "elements": [
                {"type": "mrkdwn", "text": f":books: Sources: {source_text}"}
            ]
        })

    return blocks

When the footer says :books: Sources: deployment-runbook.md, people know the bot is citing a real document. When there are no sources — because the answer wasn't found — they know something's off before they act on it.

The Source Tracking Gap

The Part 1 answer_question function returned just the answer string. It also never stored metadata with the chunks, so there was no way to know which files were retrieved. Two small additions close both gaps.

First, update build_doc_store to pass a metadatas list when adding documents:

def build_doc_store(docs_dir: str) -> chromadb.Collection:
    db = chromadb.Client()
    collection = db.create_collection("internal_docs")

    docs_path = Path(docs_dir)
    all_chunks: list[str] = []
    ids: list[str] = []
    metadatas: list[dict] = []

    for i, file_path in enumerate(docs_path.rglob("*.md")):
        text = load_text_file(str(file_path))
        chunks = chunk_by_section(text)
        for j, chunk in enumerate(chunks):
            all_chunks.append(chunk)
            ids.append(f"{file_path.stem}-{i}-{j}")
            metadatas.append({"source": file_path.name})

    for file_path in docs_path.rglob("*.pdf"):
        text = load_pdf(str(file_path))
        chunks = chunk_by_section(text)
        for j, chunk in enumerate(chunks):
            all_chunks.append(chunk)
            ids.append(f"{file_path.stem}-pdf-{j}")
            metadatas.append({"source": file_path.name})

    collection.add(documents=all_chunks, ids=ids, metadatas=metadatas)
    return collection

Then update answer_question to read that metadata back and return both the answer and the source list:

def answer_question(
    question: str,
    coll: chromadb.Collection | None
) -> tuple[str, list[str]]:
    if coll is None:
        return "The doc store isn't loaded. Run `/reindex` to index your docs.", []

    results = coll.query(query_texts=[question], n_results=5)
    context_chunks = results["documents"][0]
    raw_metadata = results.get("metadatas", [[]])[0]

    context = "\n\n---\n\n".join(context_chunks)
    sources = list({m.get("source", "unknown") for m in raw_metadata if m})

    message = client.messages.create(
        model="claude-sonnet-4-6",  # or claude-opus-4-8 for longer, more complex docs
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": f"""You are a helpful assistant answering questions about internal company documentation.

Use only the context provided below. If the answer isn't in the context, say so directly — don't guess.

Context:
{context}

Question: {question}"""
            }
        ]
    )

    return message.content[0].text, sources

The set() deduplication on sources matters — the same file can contribute multiple retrieved chunks, and showing deployment-runbook.md | deployment-runbook.md | deployment-runbook.md in the footer looks broken.

Add a /reindex Command

Restarting the process to pick up doc changes is fine for development. It's annoying in production. /reindex lets you trigger a fresh index without touching the running process.

@app.command("/reindex")
def handle_reindex(ack, respond, command):
    ack()
    respond(text=":hourglass_flowing_sand: Reindexing docs...", response_type="ephemeral")

    def do_reindex():
        global collection
        try:
            docs_dir = os.environ.get("DOCS_DIR", "./docs")
            collection = build_doc_store(docs_dir)
            respond(
                text=":white_check_mark: Docs reindexed.",
                response_type="ephemeral"
            )
        except Exception as e:
            respond(
                text=f":warning: Reindex failed: {e}",
                response_type="ephemeral"
            )

    threading.Thread(target=do_reindex, daemon=True).start()

A note on thread safety: swapping collection isn't technically atomic in a strict sense — if an /ask request reads collection at the exact moment /reindex replaces it, behavior is undefined. CPython's GIL makes this unlikely to corrupt anything in practice, but I'd add a threading.Lock around the reads and writes before putting this in a high-traffic environment. For most internal doc bots handling a few dozen queries a day, it's fine as written.

Add the /reindex slash command in the Slack dashboard the same way you added /ask — same URL placeholder, different command name.

The Complete main()

from pathlib import Path
from anthropic import Anthropic, APIError
from slack_bolt import App
from slack_bolt.adapter.socket_mode import SocketModeHandler
import chromadb
import os
import threading

# Import chunk_by_section, load_text_file, load_pdf from Part 1 (unchanged)

client = Anthropic()
app = App(token=os.environ["SLACK_BOT_TOKEN"])
collection: chromadb.Collection | None = None

# ... handler functions above ...

def main():
    global collection
    docs_dir = os.environ.get("DOCS_DIR", "./docs")
    print(f"Indexing docs from {docs_dir}...")
    collection = build_doc_store(docs_dir)
    print("Done. Starting bot.")

    handler = SocketModeHandler(app, os.environ["SLACK_APP_TOKEN"])
    handler.start()

if __name__ == "__main__":
    main()

The chunking and loading functions from Part 1 drop in unchanged. Nothing about the RAG core needed to change — just the Slack layer around it.

Wrapping Up

Socket mode works fine for internal deployments. If you want the bot to survive machine restarts, a systemd service unit or a launchd plist handles that cleanly — set the four environment variables in the service definition and point it at your bot.py.

The other obvious next step is persistent ChromaDB. Switching from chromadb.Client() to chromadb.PersistentClient(path="./chroma_data") means the collection survives restarts. Combined with the /reindex command, you only pay the indexing cost when docs actually change — not every time the process starts. That's a three-line change to build_doc_store, and it makes a noticeable difference as your doc collection grows.

<< Go Back To Part 1