Build a Laravel AI Writing Assistant with Streaming Responses

You're demoing the writing tool you just built. You paste in a paragraph, click "Adjust Tone," and wait. The output field stays blank. Six seconds pass. Eight. The client glances at you. "Did it freeze?" Then 380 words appear at once, already formatted, already correct — but the damage is done. The wait felt like an error.

Streaming fixes this. Instead of waiting for the model to finish before sending anything, you send tokens as they're generated. The user sees text appearing word by word within a fraction of a second of hitting submit. Same total latency, completely different feel.

laravel/ai's stream() method makes this a one-line change from a synchronous call. The rest of this article builds a writing assistant that handles three tasks — tone adjustment, draft continuation, and SEO meta description generation — all streamed live to the browser using Anthropic's Claude. No WebSockets. No queues. Just Server-Sent Events and about 60 lines of PHP.

What the `stream()` Method Actually Sends

SSE (Server-Sent Events) is a plain HTTP connection that stays open. The server pushes data in data: ... lines; the browser reads them via the native EventSource API. No polling, no WebSockets, no additional infrastructure.

When you call stream() on a laravel/ai agent, it returns a StreamedResponse. Laravel keeps the connection open while the model generates, flushing each token to the client as it arrives. The client's EventSource listener fires for each chunk and you append it to the DOM.

The simplest possible route shows the shape of it:

use App\Ai\Agents\WritingAssistantAgent;

Route::get('/stream-test', function () {
    return (new WritingAssistantAgent)->stream('Explain SSE in two sentences.');
});

Open that in a browser and you'll see text appear incrementally. That's the whole mechanism.

Set Up the Agent

composer require laravel/ai
php artisan vendor:publish --provider="Laravel\Ai\AiServiceProvider"
php artisan migrate

Add your Anthropic key to .env:

ANTHROPIC_API_KEY=sk-ant-...

The migration creates agent_conversations and agent_conversation_messages tables for built-in conversation memory. You don't need them for this assistant, but they're there if you want session-based context later.

Generate the agent class:

php artisan make:agent WritingAssistantAgent

That creates app/Ai/Agents/WritingAssistantAgent.php. Replace its contents:

<?php

namespace App\Ai\Agents;

use Laravel\Ai\Attributes\MaxTokens;
use Laravel\Ai\Attributes\Model;
use Laravel\Ai\Attributes\Provider;
use Laravel\Ai\Attributes\Temperature;
use Laravel\Ai\Contracts\Agent;
use Laravel\Ai\Enums\Lab;
use Laravel\Ai\Promptable;

#[Provider(Lab::Anthropic)]
#[Model('claude-sonnet-4-6')]  // or claude-opus-4-8 for complex docs
#[Temperature(0.8)]
#[MaxTokens(1024)]
class WritingAssistantAgent implements Agent
{
    use Promptable;

    public function instructions(): string
    {
        return <<<INSTRUCTIONS
        You are a professional writing assistant. Your output is always clean prose —
        no markdown formatting, no headers, no bullet points unless the user explicitly asks.
        Write directly. Never explain what you're doing before you do it.
        Never summarize what you just wrote at the end.
        INSTRUCTIONS;
    }
}

The instructions() method is the system prompt. The attributes above it are the provider configuration. Temperature at 0.8 gives Claude enough latitude to vary phrasing — drop it to 0.3 if you want more conservative rewrites.

Three Routes, Three Tasks

Add these to routes/web.php:

use App\Http\Controllers\WritingController;

Route::post('/writing/tone', [WritingController::class, 'adjustTone']);
Route::post('/writing/draft', [WritingController::class, 'continueDraft']);
Route::post('/writing/meta', [WritingController::class, 'generateMeta']);

Create the controller:

php artisan make:controller WritingController

<?php

namespace App\Http\Controllers;

use App\Ai\Agents\WritingAssistantAgent;
use Illuminate\Http\Request;

class WritingController extends Controller
{
    public function adjustTone(Request $request)
    {
        $request->validate([
            'text' => 'required|string|max:3000',
            'tone' => 'required|in:formal,casual,persuasive',
        ]);

        $prompt = sprintf(
            'Rewrite the following text in a %s tone. Preserve all factual content. Output only the rewritten text:\n\n%s',
            $request->tone,
            $request->text
        );

        return (new WritingAssistantAgent)->stream($prompt);
    }

    public function continueDraft(Request $request)
    {
        $request->validate([
            'draft' => 'required|string|max:3000',
        ]);

        $prompt = sprintf(
            'Continue the following draft. Match the existing voice, sentence length, and tone exactly. Write 2–3 paragraphs:\n\n%s',
            $request->draft
        );

        return (new WritingAssistantAgent)->stream($prompt);
    }

    public function generateMeta(Request $request)
    {
        $request->validate([
            'title' => 'required|string|max:200',
            'excerpt' => 'required|string|max:1000',
        ]);

        $prompt = sprintf(
            'Write an SEO meta description for this article. Requirements: under 160 characters, include the primary keyword from the title in the first half, write as a compelling one-sentence pitch, no trailing period.\n\nTitle: %s\n\nExcerpt: %s',
            $request->title,
            $request->excerpt
        );

        return (new WritingAssistantAgent)->stream($prompt);
    }
}

Each method validates its input, builds a prompt, and returns the stream directly. No manual response wrapping. The StreamedResponse from stream() is already a valid HTTP response — Laravel just passes it through.

Consuming the Stream in the Browser

EventSource only works with GET requests. Since we need POST (to send form data), there's a mismatch. The clean fix: POST to a controller that stores the prompt in session and returns a URL, then open that URL as an EventSource. For a demo or internal tool, a simpler approach works — send the prompt as a query parameter on a GET route.

For the version here, I'll use the GET approach to keep the frontend code straightforward:

Route::get('/writing/stream', [WritingController::class, 'streamResponse']);

public function streamResponse(Request $request)
{
    $request->validate([
        'task' => 'required|in:tone,draft,meta',
        'text' => 'required|string|max:3000',
        'tone' => 'nullable|in:formal,casual,persuasive',
    ]);

    $prompt = match ($request->task) {
        'tone' => sprintf(
            'Rewrite the following text in a %s tone. Output only the rewritten text:\n\n%s',
            $request->tone ?? 'formal',
            $request->text
        ),
        'draft' => sprintf(
            'Continue this draft for 2–3 paragraphs. Match the existing voice exactly:\n\n%s',
            $request->text
        ),
        'meta' => sprintf(
            'Write an SEO meta description under 160 chars. Include the primary keyword in the first half:\n\n%s',
            $request->text
        ),
    };

    return (new WritingAssistantAgent)->stream($prompt);
}

The Blade view:

<div x-data="writer()" class="max-w-2xl mx-auto p-6">
    <select x-model="task" class="mb-4 w-full border rounded p-2">
        <option value="draft">Continue draft</option>
        <option value="tone">Adjust tone</option>
        <option value="meta">Generate meta description</option>
    </select>

    <div x-show="task === 'tone'" class="mb-4">
        <select x-model="tone" class="w-full border rounded p-2">
            <option value="formal">Formal</option>
            <option value="casual">Casual</option>
            <option value="persuasive">Persuasive</option>
        </select>
    </div>

    <textarea
        x-model="input"
        placeholder="Paste your text here..."
        class="w-full h-40 border rounded p-2 mb-4"
    ></textarea>

    <button @click="generate()" class="bg-blue-600 text-white px-4 py-2 rounded">
        Generate
    </button>

    <div x-show="output" class="mt-6 p-4 bg-gray-50 rounded whitespace-pre-wrap" x-text="output"></div>
</div>

<script>
function writer() {
    return {
        task: 'draft',
        tone: 'formal',
        input: '',
        output: '',
        source: null,

        generate() {
            this.output = '';
            if (this.source) this.source.close();

            const params = new URLSearchParams({
                task: this.task,
                text: this.input,
                tone: this.tone,
            });

            this.source = new EventSource(`/writing/stream?${params}`);

            this.source.onmessage = (e) => {
                if (e.data === '[DONE]') {
                    this.source.close();
                    return;
                }
                this.output += e.data;
            };

            this.source.onerror = () => {
                this.source.close();
            };
        }
    }
}
</script>

The example uses Alpine.js for the reactivity layer, but the EventSource portion is vanilla. onmessage fires for every data: line the server sends. When laravel/ai finishes, it sends data: [DONE] — you check for that string and close the connection.

Does It Work Across All Three Tasks?

Yes, but the experience varies. Tone adjustment and draft continuation stream naturally — you see sentences form. Meta description is 160 characters, so the stream opens, three or four tokens appear, and it's done. Streaming a one-liner is technically correct but anticlimactic. For very short outputs, a standard prompt() call and a loading spinner would feel more natural.

The distinction worth building into your UI: show a streaming cursor for tasks that produce long output, skip it for tasks that produce short output. You can make that decision at the controller level by routing short tasks through prompt() instead of stream(), or by letting the client detect when the stream closes unusually fast and switch rendering mode.

The [DONE] sentinel is what makes that possible — you know exactly when the stream ends, so you can measure total generation time and decide how to handle future requests of the same type.

The Edge Case with Long Input

The tutorial glosses over what happens when someone pastes a 3,000-word article and asks to continue it.

The stream opens. Text starts appearing. Then it stops mid-sentence. No error event fires. The EventSource stays open. The user waits, assuming more is coming. It isn't.

What happened: #[MaxTokens(1024)] capped the response at roughly 750–800 words of output. The model generated tokens until it hit the ceiling and stopped cleanly — cleanly from the API's perspective. From the client's perspective, silence.

Bumping MaxTokens to 4096 solves the truncation but introduces a different problem: the default HTTP timeout for most PHP-FPM setups is 30–60 seconds. A 4096-token response at normal generation speed runs between 20 and 40 seconds. Whether that completes before PHP-FPM kills the connection depends on your server config, not your code.

The #[Timeout(120)] attribute on the agent class tells the underlying HTTP client to wait up to 120 seconds for the provider — it doesn't override PHP's own execution limit. You'd also need to set max_execution_time = 120 in your PHP config or in a .htaccess rule, and configure nginx/apache to match.

Chunked input is a more reliable path for genuinely long documents: split the input client-side into sections of 800–1000 words, stream each one sequentially, concatenate the results. More complex, but it doesn't fight the infrastructure you're running on.

I don't have a clean answer for which tradeoff is worth making in your specific setup. Both directions have costs.

What the stream() Method Actually Sends