Prompt-Driven React Apps: streaming UI, tool calling, and agent loops

A chatbot is easy. A chatbot that can actually change the state of your app (filter a table, open a panel, create a record, undo its own mistakes) is a different beast.

That second thing is what people mean by "prompt-driven" or "agentic" UI in 2026. Let's look at what changes in your React code when the source of state updates shifts from clicks and keystrokes to natural-language intent.

What Prompt-Driven Actually Means

A chatbot UI is a thin shell. User types, model replies, UI appends a message. The model never changes what the app is.

A prompt-driven UI is different. The model can:

Pick a tool and run it.
Update real state in your app (filter a table, open a panel, create a record).
Compose a response from many tool results.
Hand control back to the user mid-flow.

The user types intent. The model translates that intent into state changes. Your React app renders those state changes the same way it renders any other state. The difference is the source of the change.

That shift has practical consequences for how you design components, where you put state, and what you trust.

Who's Already Doing This?

A few apps you've probably touched:

Vercel's v0: type "make me a landing page" and the chat composes a React tree you can edit.
Cursor and Windsurf: chat with the codebase, the model edits files, the IDE renders the diff.
Notion AI and Linear's AI features: not full-tree generation, but targeted state changes (summarise, rewrite, classify) driven by prompts.
ChatGPT canvas and Claude artifacts: side-by-side editable surfaces driven by a conversation.
Replit Agent: an agent that builds an entire app from a prompt and runs it.

The common shape: a conversation on one side, app state changing on the other, with tool calls binding the two.

The Update Loop, Stretched Out

In a normal React app the update loop is short. In a prompt-driven app it's much longer, and every arrow is async.

The Update Loop, Stretched Out diagram

Every arrow on the right side can fail. Every arrow can stream. Every arrow can be cancelled mid-flight when the user types something new.

Trying to express this with a single useState is how you lose your weekend. You want a small state machine, not a flat boolean soup.

A State Machine for Agent Status

type AgentStatus =
  | { kind: "idle" }
  | { kind: "thinking" }
  | { kind: "streaming"; partial: string }
  | { kind: "tool"; name: string; args: unknown }
  | { kind: "tool-result"; name: string; result: unknown }
  | { kind: "error"; message: string };

Pattern-match on it in your render and the code stays sane. Every state has its own visual representation. Transitions are explicit. Error is not a forgotten branch.

function AgentSurface({ status }: { status: AgentStatus }) {
  switch (status.kind) {
    case "idle":
      return <PromptInput />;
    case "thinking":
      return <ThinkingIndicator />;
    case "streaming":
      return <StreamedMarkdown text={status.partial} />;
    case "tool":
      return <ToolPanel name={status.name} args={status.args} />;
    case "tool-result":
      return <ToolResult name={status.name} result={status.result} />;
    case "error":
      return <ErrorBanner message={status.message} />;
  }
}

If you reach for a state machine library (XState, Robot, Zag), this is exactly the shape they shine at. For something small, a typed discriminated union and a switch are enough.

Think of It Like a Concierge

A hotel concierge. The guest says what they want, in plain English. The concierge picks the right staff member, asks them to do the thing, reports back, and asks if there's anything else.

The guest is your user.
The concierge is the LLM.
The staff are your tools.
The hotel front desk is your React tree.

The guest never goes into the kitchen, never operates the elevator, never opens the safe. The concierge interprets intent. The staff actually execute. If something goes wrong, the concierge owns the apology and the rollback.

Two things the concierge does not do: take payment without confirmation, or do something irreversible without checking back. That's the trust boundary, and it's the same in software.

Streaming, Not Buffering

The temptation is to wait for the full response and render it once. Don't. Token streaming is one of the few patterns that changes the feel of an AI app, and React handles it well if you set it up right.

Use the platform: fetch with a streamed body, or the official SDK's streaming client. Push tokens into state, render markdown progressively.

const [text, setText] = useState("");

async function run(prompt: string) {
  setText("");
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-6",
    messages: [{ role: "user", content: prompt }],
  });
  for await (const event of stream) {
    if (event.type === "content_block_delta") {
      setText((prev) => prev + event.delta.text);
    }
  }
}

Two traps to avoid.

Don't call setState per token. Tokens fly in faster than the display refreshes. Setting state on every one fires 100+ renders per second for zero user-visible benefit. Coalesce with requestAnimationFrame:

let pending = "";
let scheduled = false;

function append(delta: string) {
  pending += delta;
  if (scheduled) return;
  scheduled = true;
  requestAnimationFrame(() => {
    const chunk = pending;
    pending = "";
    scheduled = false;
    setText((prev) => prev + chunk);
  });
}

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    append(event.delta.text);
  }
}

Now state updates cap at the display refresh rate, no matter how fast the model streams.

Markdown rendering is O(n²) if you re-parse a growing string per token. Use a streaming-friendly renderer (the ones built for AI surfaces, like react-markdown with a careful memo boundary), or hold the raw text and only re-parse on a frame boundary.

const Streamed = memo(function Streamed({ text }: { text: string }) {
  return <ReactMarkdown>{text}</ReactMarkdown>;
});

The memo plus the RAF coalescing means a 2,000-token response renders in 30 to 40 frames, not 2,000.

Tool Calling, In React Terms

A tool call is the model saying: "here's a function I want to run, with these arguments." Your job:

Recognise the tool call in the stream.
Validate the arguments against a schema.
Run the tool.
Feed the result back to the model.
Reflect any UI side effects.

The flow looks like this:

Tool Calling, In React Terms diagram

The third and fifth steps are where React design gets interesting. A tool can be pure (look up data, return JSON). A tool can also be destructive in your UI (open a dialog, navigate to a page, delete a record).

Keep the mapping from tool name to handler in one place:

import { z } from "zod";

const filterTable = {
  schema: z.object({
    column: z.string(),
    value: z.string(),
  }),
  handler: async (args: z.infer<typeof filterTable.schema>) => {
    tableStore.setFilter(args.column, args.value);
    return { ok: true };
  },
};

const openRecord = {
  schema: z.object({ id: z.string().uuid() }),
  handler: async (args: z.infer<typeof openRecord.schema>) => {
    navigate(`/records/${args.id}`);
    return { ok: true };
  },
};

const tools = { filterTable, openRecord } as const;

async function dispatch(name: keyof typeof tools, rawArgs: unknown) {
  const tool = tools[name];
  if (!tool) return { ok: false, error: `Unknown tool: ${name}` };
  const parsed = tool.schema.safeParse(rawArgs);
  if (!parsed.success) {
    return { ok: false, error: parsed.error.message };
  }
  audit.log({ name, args: parsed.data, at: Date.now() });
  return tool.handler(parsed.data);
}

Two non-negotiables:

Validate every tool argument with a schema. Zod, Valibot, ArkType, whichever. The model probably sends the shape you asked for. It only takes one bad JSON to crash your app, and that one will happen in the demo.
Make every destructive tool reversible. Push an undo entry. Show what changed. Let the user roll back. If you can't undo cleanly, the tool isn't an action: it's a confirmation dialog.

Parallel tool calls

Modern models can request multiple tools in a single turn. The naive shape (run them serially) wastes wall-clock time. Run them in parallel and gather:

const results = await Promise.all(
  toolCalls.map((call) => dispatch(call.name, call.args)),
);
sendToolResults(results);

A small caveat: if two tools mutate overlapping state (both modify the table filter), order matters and you need to serialise those. The simplest rule is to mark each tool as pure or mutating and run mutating tools sequentially in declaration order.

Optimistic and Reversible

The lag between intent and effect is the single biggest UX problem in agentic UIs. The model takes time. The tool takes time. The user sits there.

The fix is the same as any optimistic UI: render the intended state immediately, reconcile when the real result lands.

Optimistic and Reversible diagram

A minimal implementation, including the rollback:

type Patch = { id: string; revert: () => void };
const [optimistic, setOptimistic] = useState<Patch[]>([]);

async function applyToolCall(call: ToolCall) {
  const id = crypto.randomUUID();
  const revert = predictAndApply(call); // mutates UI, returns a revert fn
  const patch: Patch = { id, revert };
  setOptimistic((p) => [...p, patch]);

  try {
    await dispatch(call.name, call.args);
    // Tool succeeded. The optimistic prediction is now the real state.
  } catch (err) {
    patch.revert(); // Tool failed. Roll back.
    showError(err);
  } finally {
    setOptimistic((p) => p.filter((q) => q.id !== id));
  }
}

The UI moves instantly. If the tool succeeds, the change becomes real. If it fails, it rolls back. The user feels speed, even when the model is taking a moment.

Suspense Fits, With Caveats

React Suspense is a natural fit for the "thinking" state. You suspend on a promise, you render a fallback, you resume when the promise resolves.

<Suspense fallback={<ThinkingIndicator />}>
  <AgentTurn promise={agent.next(prompt)} />
</Suspense>

It works less well for streaming state, because Suspense flips a binary (pending or resolved) and a stream is neither. For streams, hold the partial text in state and render it directly. Use Suspense only for the steps you can express as "fetch and then show":

A tool that returns a record.
A search that returns a list.
A planning step that returns a multi-step intent.

A layered pattern that holds up:

Suspense around the next tool result.
Plain state around the current streamed thought.
An error boundary fallback wrapping everything.

<ErrorBoundary fallback={<ErrorBanner />}>
  <Suspense fallback={<ThinkingIndicator />}>
    <ToolResultPanel id={pendingToolId} />
  </Suspense>
  <StreamedThought text={partial} />
</ErrorBoundary>

The Trust Boundary

Anything the model produces is input you didn't control. Treat it that way.

The Trust Boundary diagram

The contract is:

Render text safely. Run model markdown through a hardened renderer. No dangerouslySetInnerHTML, no inline scripts, no raw HTML pass-through unless you have a sanitiser you trust. rehype-sanitize is the conventional choice for react-markdown.
Schema-validate every tool argument. The model will, sometimes, send strings where you expected numbers and arrays where you expected objects. Reject confidently.
Confirm destructive actions. Deleting a record, sending a message, changing a setting. Show the predicted effect, get a click, then commit. The model may suggest. The user should approve.
Audit-log every tool call. When something goes wrong (and it will), you want to know which call did it, with what arguments, and what the model was thinking.

audit.log({
  ts: Date.now(),
  user: currentUser.id,
  tool: call.name,
  args: call.args,
  result: { ok: true },
});

Trust the model to suggest. Don't trust it to commit.

Prompt injection at the UI layer

Even the model's input can be hostile. If you let your app paste user-fetched content (a webpage, a PDF, an email) into the prompt, you've widened the attack surface. A clever attacker can write content that looks like a tool call to your model.

Two practical guards:

Never let the model run a destructive tool autonomously. Always require a UI confirmation for anything irreversible.
Treat fetched content as data, not instructions. Wrap external text in markers (<user_content>...</user_content>) and instruct the model to treat that block as untrusted.

The model layer can't fully prevent injection. The UI layer can keep the blast radius small.

Generative Components, Briefly

The frontier shape: components rendered by the model itself. The model picks a layout, picks children, picks props. Your runtime renders a tree it has never seen before.

Two practical patterns work today:

JSON layout plus a finite registry. The model emits a structured layout, your renderer maps each node to a known component. Safe, predictable, easy to lint. The model can compose, not invent.

type LayoutNode =
  | { type: "card"; props: { title: string }; children?: LayoutNode[] }
  | { type: "list"; props: { items: string[] } }
  | { type: "chart"; props: { series: number[]; label: string } };

const registry = {
  card: CardComponent,
  list: ListComponent,
  chart: ChartComponent,
} as const;

function renderNode(node: LayoutNode): React.ReactNode {
  const Comp = registry[node.type];
  if (!Comp) return null;
  return (
    <Comp {...(node.props as any)}>
      {node.children?.map((c, i) => (
        <Fragment key={i}>{renderNode(c)}</Fragment>
      ))}
    </Comp>
  );
}

Combine with Zod validation on the incoming layout and the model literally can't conjure unknown components. New surface? Add to the registry, ship the next build.

Server-rendered React, streamed back. The model lives on a server with the React runtime, composes a tree, streams the rendered output, your client mounts it. More powerful, more attack surface. Worth it only when the surface is genuinely model-defined.

Start with the registry. Almost every interesting generative UI today fits inside ten components, and the JSON contract is much easier to evolve.

Observability: Log Every Turn

A prompt-driven app behaves differently for every user, every session, sometimes every turn. Without logs, you have no idea why something went wrong.

Three things to log per turn:

The full message history sent to the model (prompt, system, tools).
The full model response (text, tool calls, finish reason).
Every tool dispatch (name, args, result, duration).

function instrument(turn: Turn) {
  return {
    prompt: turn.messages,
    response: turn.modelOutput,
    tools: turn.toolCalls.map((t) => ({
      name: t.name,
      args: t.args,
      durationMs: t.endedAt - t.startedAt,
      ok: t.result.ok,
    })),
    tokens: { input: turn.inputTokens, output: turn.outputTokens },
    latencyMs: turn.endedAt - turn.startedAt,
  };
}

await analytics.track("agent_turn", instrument(turn));

Aggregate the result by user, by tool, by failure mode. You'll find within a week which tool is brittle, which prompt template is misleading, which path the model keeps fumbling.

Cost and Latency Awareness

Two cost dials that web apps never had to think about:

Per-call cost in dollars. Model calls aren't free. A heavy agentic loop can cost cents per turn. Show users a live token meter for transparency, or cap the number of tool calls per turn.
Latency budget. A planning step that takes 8 seconds feels broken in an interactive UI. Either stream intermediate progress ("Looking up the record... checking permissions...") or pre-compute the slow steps and pass them as context.

Both are UX problems wearing engineering clothes. Solve them in the UI, not just in the model layer.

Should You Make Your App Prompt-Driven?

Not every app needs this.

If users know what they want and the UI is direct, don't add a prompt. Prompts are slower than clicks for known tasks.
If failure is expensive, you'll need a confirmation layer. Anything irreversible must be confirmed by a human.
If sub-second latency matters, the model is a tax. Use it where a couple of seconds of "thinking" feels appropriate, not for hotkeys.
If the surface is small, a structured form still beats a chat input.

The right test: would a power user click faster than they'd type? If yes, keep the click. If no, the prompt earns its place.

Wrapping Up

Prompt-driven UIs are not chatbots with extra steps. They're React apps where the source of state changes has shifted from clicks to natural-language intent translated into tool calls.

The patterns that hold up aren't new:

Small state machines, not flat boolean soup.
Coalesced streams instead of token-per-render.
Schema-validated tools at every boundary.
Optimistic and reversible changes.
A clear, enforced trust boundary.
Logs for every turn.

Get those right and "AI in your app" stops being a demo and starts being something you can ship.

#reactjs #ai #llm #frontend #javascript

Prompt-Driven React Apps

What Prompt-Driven Actually Means

Who's Already Doing This?

The Update Loop, Stretched Out

A State Machine for Agent Status

Think of It Like a Concierge

Streaming, Not Buffering

Tool Calling, In React Terms

Parallel tool calls

Optimistic and Reversible

Suspense Fits, With Caveats

The Trust Boundary

Prompt injection at the UI layer

Generative Components, Briefly

Observability: Log Every Turn

Cost and Latency Awareness

Should You Make Your App Prompt-Driven?

Wrapping Up

Comments

More from this blog

Removed

Removed

smoke test

How I Built an AI Coding Agent for My Design System (in 589 Lines)

How I Built a Figma-Like Canvas Editor in React (and What I Learned)

Command Palette

What Prompt-Driven Actually Means

Who's Already Doing This?

The Update Loop, Stretched Out

A State Machine for Agent Status

Think of It Like a Concierge

Streaming, Not Buffering

Tool Calling, In React Terms

Parallel tool calls

Optimistic and Reversible

Suspense Fits, With Caveats

The Trust Boundary

Prompt injection at the UI layer

Generative Components, Briefly

Observability: Log Every Turn

Cost and Latency Awareness

Should You Make Your App Prompt-Driven?

Wrapping Up

Comments

More from this blog