AI-Friendly Design Systems: building React libraries LLMs can actually use

For years, design systems had two audiences: designers and developers. The Figma library and the React component package were both written for human consumers. Documentation was good enough if a senior engineer could read it and figure things out.

That contract is breaking.

The new consumer of your design system is a language model. v0 generates pages by composing components. Cursor and Claude write feature code by importing your primitives. Builder.io maps Figma frames to React. Internal AI agents are starting to ship features end-to-end. The "user" of your library is increasingly something that has read a million component libraries and is making an educated guess about yours.

That guess goes badly when the library was only written for humans. Boolean soup props, unstructured tokens, undocumented invariants, all the patterns we got away with for years break the moment a model tries to use them.

This post walks through what changes when you build a design system for both humans and models, and the five things you actually have to get right.

What Changes When Models Consume Components

Three things, mostly.

Discovery shifts from search to retrieval. A developer finds a component by reading Storybook, asking a colleague, or grepping the codebase. A model has to retrieve it from somewhere structured, given a prompt like "I need a card with a header, a price, and a CTA." If your library has no metadata describing what each component does, the model can't find it.

Composition shifts from intuition to schema. A developer composes components by feel. They know Button takes variant="primary" because they've used it before. A model needs to know the valid values of variant and which combinations work. Without a schema, it guesses, and the guesses are subtly wrong.

Verification shifts from review to validation. A developer's mistake gets caught in PR review. A model's mistake gets shipped if your CI doesn't catch it. The fix is to make invalid usage either impossible (typed APIs) or caught automatically (lint rules, runtime validation).

A design system that handles all three is what I mean by AI-friendly. It's not a marketing label. It's a structural property of the codebase.

Who's Already Doing This?

A few production design systems leaning this way:

shadcn/ui ships components as source files plus structured config (components.json). Models can read the actual JSX they're working with.
Radix UI has typed, composable primitives with explicit slot APIs that models compose reliably.
Vercel's v0 is essentially a model trained to use shadcn plus Tailwind. The success of v0 is largely "we picked a library a model can use well."
Builder.io's Mitosis treats components as platform-agnostic JSON, by design legible to non-human consumers.
Material UI and Mantine publish typed prop schemas that AI code assistants pick up by inspection.
Storybook is moving toward indexed metadata (CSF3, story metadata) that's effectively a retrieval surface for AI tools.

The pattern: structure that a model can read, validate, and compose without guessing.

Think of It Like a Library Catalogue

A traditional library has shelves of books. To use it, you walk in, browse, and pull what looks right. Works fine for a human with time and context.

An AI-friendly library has a catalogue. Every book has a card with the title, author, summary, related works, age rating, and which floor it sits on. A new reader can find what they need without browsing.

The books are your components.
The cards are your component metadata.
The catalogue is the structure that lets retrieval happen.
The shelves are still there, for humans who want to browse.

Build the catalogue and both audiences benefit. Humans get better docs. Models get something they can actually query.

The Five Pillars

An AI-friendly design system rests on five things, each of which can be built independently:

Semantic component APIs. Props that describe intent, not styling.
Structured design tokens. A token graph from raw values to semantic roles.
Component metadata. Machine-readable descriptors of what each component is and how it composes.
Retrieval pipelines. A way for models to find the right component given a prompt.
Prompt-to-component workflows. The end-to-end glue that turns intent into rendered UI.

Each pillar is useful on its own. Together they make a library a model can use the way a senior developer would.

Pillar 1: Semantic Component APIs

Props are a contract. A model has to read that contract and pick the right values.

The contract that fails is boolean soup:

// Hostile to both humans and models.
<Button
  small
  primary
  outlined
  destructive
  loading
  disabled
  iconLeft
  iconRight
/>

What's the precedence of primary vs outlined vs destructive? Can you have small and iconLeft? Does loading imply disabled? Even reading the source rarely answers these. A model has zero chance.

The contract that works is intent-first:

<Button
  variant="primary"
  size="md"
  state={isSaving ? "loading" : "idle"}
  startIcon={<SaveIcon />}
>
  Save
</Button>

The TypeScript version of the same contract:

type ButtonVariant = "primary" | "secondary" | "destructive" | "ghost";
type ButtonSize = "sm" | "md" | "lg";
type ButtonState = "idle" | "loading" | "disabled";

interface ButtonProps {
  variant?: ButtonVariant;
  size?: ButtonSize;
  state?: ButtonState;
  startIcon?: React.ReactNode;
  endIcon?: React.ReactNode;
  children: React.ReactNode;
  onClick?: (e: React.MouseEvent<HTMLButtonElement>) => void;
}

Three properties make this AI-friendly:

Enums, not booleans. The model knows the legal values up front.
Mutually exclusive states. state is one enum, not three booleans, so the model can't accidentally request {loading: true, disabled: true, idle: true}.
Intent-named, not appearance-named. variant="destructive" tells you why, not what. The styling for "destructive" can change. The intent doesn't.

If a single component grows too many states, that's a signal to split it, not to add another boolean. <DangerButton> plus <PrimaryButton> is better than <Button danger primary> if the variants diverge enough.

This is the boring layer. It's also the one most teams skip, and the one models complain about most when you watch them try to use your library.

Pillar 2: Structured Design Tokens

Tokens are the atomic vocabulary of your design system. Get them right and everything above flows. Get them wrong and you're patching colour bugs forever.

The shape that scales is a graph, not a flat list:

Pillar 2: Structured Design Tokens diagram

Three layers:

Raw values. Hex codes, pixel sizes, line heights. Nobody references these directly in components.
Primitive tokens. Named raw values. color.orange.500 = "#FF5C00". Still anonymous as to intent.
Semantic tokens. Intent-named aliases. color.action.primary -> color.orange.500. This is what components actually use.

A flat JSON shape that captures the graph:

{
  "color": {
    "orange": {
      "500": { "value": "#FF5C00", "type": "color" }
    },
    "action": {
      "primary": { "value": "{color.orange.500}", "type": "color" },
      "primaryHover": { "value": "{color.orange.600}", "type": "color" }
    }
  },
  "space": {
    "4": { "value": "16px", "type": "dimension" },
    "button": {
      "paddingX": { "value": "{space.4}", "type": "dimension" }
    }
  }
}

Tools like Style Dictionary or Tokens Studio consume this shape and generate output for every platform you support: CSS variables, TypeScript types, iOS / Android constants, Figma variable libraries.

The generated output, on the React side:

:root {
  --color-action-primary: #FF5C00;
  --space-button-padding-x: 16px;
}

export const tokens = {
  color: {
    action: {
      primary: "var(--color-action-primary)",
      primaryHover: "var(--color-action-primary-hover)",
    },
  },
  space: {
    button: { paddingX: "var(--space-button-padding-x)" },
  },
} as const;

Why models care: a structured token graph is itself a piece of metadata. Given a prompt like "use the destructive colour for this button," a model with access to the token tree can resolve "destructive" to color.feedback.danger without guessing. Without the tree, it picks red and hopes.

A theming bonus: dark mode becomes one CSS variable override at the semantic layer, not a sprawling sweep through every component.

Pillar 3: Component Metadata Architecture

Once your components have semantic APIs and your tokens are structured, the next layer is the catalogue card.

A component metadata file is JSON (or YAML, or MDX with frontmatter, pick your poison) that describes:

What the component is, in plain English.
Its prop schema, machine-readable.
A handful of canonical usage examples.
Slots and composition rules.
Accessibility notes the consumer should preserve.
Which design tokens it consumes (and so what's safe to theme).

A practical shape for one component:

{
  "name": "Button",
  "description": "Triggers an action. Use for the primary verb on a screen.",
  "category": "input",
  "import": "import { Button } from '@yourds/primitives';",
  "props": {
    "variant": {
      "type": "enum",
      "values": ["primary", "secondary", "destructive", "ghost"],
      "default": "primary",
      "description": "Visual emphasis. Use 'destructive' for irreversible actions."
    },
    "size": {
      "type": "enum",
      "values": ["sm", "md", "lg"],
      "default": "md"
    },
    "state": {
      "type": "enum",
      "values": ["idle", "loading", "disabled"],
      "default": "idle"
    },
    "startIcon": { "type": "node" },
    "endIcon": { "type": "node" },
    "children": { "type": "node", "required": true }
  },
  "examples": [
    {
      "title": "Primary CTA",
      "code": "<Button variant=\"primary\" size=\"lg\">Get started</Button>"
    },
    {
      "title": "Destructive with loading",
      "code": "<Button variant=\"destructive\" state=\"loading\">Delete</Button>"
    }
  ],
  "tokens": ["color.action.primary", "space.button.paddingX", "radius.md"],
  "a11y": {
    "role": "button",
    "rules": [
      "Always include accessible text via children or aria-label.",
      "Use type='submit' inside forms, type='button' elsewhere."
    ]
  },
  "doNot": [
    "Don't use for navigation. Use Link instead.",
    "Don't disable a button without explaining why nearby."
  ]
}

This file is the model's catalogue card. Stored alongside the component (Button/Button.meta.json), validated in CI, indexed by your retrieval pipeline.

A TypeScript schema with Zod keeps the metadata honest:

import { z } from "zod";

export const PropSchema = z.discriminatedUnion("type", [
  z.object({
    type: z.literal("enum"),
    values: z.array(z.string()),
    default: z.string().optional(),
    description: z.string().optional(),
  }),
  z.object({
    type: z.enum(["string", "number", "boolean", "node"]),
    default: z.unknown().optional(),
    description: z.string().optional(),
    required: z.boolean().optional(),
  }),
]);

export const ComponentMetaSchema = z.object({
  name: z.string(),
  description: z.string(),
  category: z.enum(["input", "feedback", "navigation", "layout", "data-display", "media", "overlay"]),
  import: z.string(),
  props: z.record(PropSchema),
  examples: z.array(z.object({ title: z.string(), code: z.string() })),
  tokens: z.array(z.string()).optional(),
  a11y: z
    .object({
      role: z.string().optional(),
      rules: z.array(z.string()),
    })
    .optional(),
  doNot: z.array(z.string()).optional(),
});

export type ComponentMeta = z.infer<typeof ComponentMetaSchema>;

In CI, walk the design system, parse every .meta.json, validate. If a component changes shape and its metadata wasn't updated, the build fails. The catalogue and the code stay in sync.

You can also generate this metadata partially: prop types come from TypeScript via react-docgen-typescript, the rest is hand-written.

Pillar 4: RAG Pipelines for Component Retrieval

Once you have metadata, the next question is how a model finds the right component given a prompt like "I need a chip that shows a status."

This is where Retrieval-Augmented Generation (RAG) comes in. Instead of cramming every component into the system prompt (which is expensive and quickly outgrows context windows), you index components in a vector store and retrieve only the relevant ones at query time.

The end-to-end flow:

Pillar 4: RAG Pipelines for Component Retrieval diagram

The pieces:

Index time. For each component, build a text representation (name + description + example titles + props) and compute an embedding. Store the embedding plus the metadata in a vector database.

import { OpenAIEmbeddings } from "@langchain/openai";
import { Chroma } from "@langchain/community/vectorstores/chroma";

async function indexComponents(components: ComponentMeta[]) {
  const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
  const docs = components.map((c) => ({
    pageContent: `\({c.name}\n\){c.description}\nUsed for: \({c.category}\nExamples: \){c.examples.map((e) => e.title).join(", ")}`,
    metadata: { ...c },
  }));
  await Chroma.fromDocuments(docs, embeddings, { collectionName: "design-system" });
}

Query time. Embed the user's prompt, find the top-k nearest components, return their metadata as context for the LLM.

async function retrieve(prompt: string, k = 5) {
  const embeddings = new OpenAIEmbeddings({ model: "text-embedding-3-small" });
  const store = await Chroma.fromExistingCollection(embeddings, {
    collectionName: "design-system",
  });
  const results = await store.similaritySearch(prompt, k);
  return results.map((r) => r.metadata as ComponentMeta);
}

Generation time. Pass the retrieved metadata into the LLM prompt as a constrained context:

const components = await retrieve(userPrompt, 5);

const systemPrompt = `
You are building UI from a fixed design system. Only use the components listed below.
Output a JSON layout, no prose.

Available components:
${components.map((c) =>
  `\({c.name}: \){c.description}\nProps: \({JSON.stringify(c.props, null, 2)}\nExamples: \){c.examples.map((e) => e.code).join(" / ")}`,
).join("\n\n")}
`;

const layout = await llm.complete({
  system: systemPrompt,
  messages: [{ role: "user", content: userPrompt }],
  response_format: "json",
});

The model now sees only the relevant components (with their full prop schemas and examples) instead of every component in the library. Generation is faster, cheaper, and more accurate.

A small Mermaid view of the retrieval flow on its own:

Pillar 4: RAG Pipelines for Component Retrieval diagram

Pillar 5: Prompt-to-Component Workflows

The five pillars come together in a prompt-to-component workflow. The user types intent. The system retrieves relevant components, the model composes a layout from them, the layout is validated, and the result renders into a real React tree.

End to end:

Pillar 5: Prompt-to-Component Workflows diagram

The layout is plain JSON, not raw JSX. JSX is unsafe to take from a model directly. JSON is structured, schema-validated, and confined to the registry of known components:

type LayoutNode =
  | { type: "Button"; props: ButtonProps; children?: string }
  | { type: "Card"; props: CardProps; children?: LayoutNode[] }
  | { type: "Input"; props: InputProps }
  | { type: "Stack"; props: StackProps; children?: LayoutNode[] };

const registry = {
  Button,
  Card,
  Input,
  Stack,
} as const;

function render(node: LayoutNode): React.ReactNode {
  const Comp = registry[node.type];
  if (!Comp) return null;
  return (
    <Comp {...(node.props as any)}>
      {Array.isArray(node.children)
        ? node.children.map((c, i) => <Fragment key={i}>{render(c)}</Fragment>)
        : node.children}
    </Comp>
  );
}

Validate the layout before rendering:

import { z } from "zod";

const LayoutNodeSchema: z.ZodType<LayoutNode> = z.lazy(() =>
  z.discriminatedUnion("type", [
    z.object({ type: z.literal("Button"), props: ButtonPropsSchema, children: z.string().optional() }),
    z.object({ type: z.literal("Card"), props: CardPropsSchema, children: z.array(LayoutNodeSchema).optional() }),
    z.object({ type: z.literal("Input"), props: InputPropsSchema }),
    z.object({ type: z.literal("Stack"), props: StackPropsSchema, children: z.array(LayoutNodeSchema).optional() }),
  ]),
);

function safeRender(layoutJson: unknown) {
  const parsed = LayoutNodeSchema.safeParse(layoutJson);
  if (!parsed.success) {
    return { ok: false, errors: parsed.error.format() };
  }
  return { ok: true, tree: render(parsed.data) };
}

If validation fails, you can feed the error back into the model for a retry. The model literally can't render an unknown component or pass invalid props, because validation catches it before render.

A Demo Repo Layout

What this looks like on disk. Practical, reproducible, easy to share with a teammate.

my-ai-friendly-ds/
├── packages/
│   ├── tokens/
│   │   ├── src/
│   │   │   ├── color.json
│   │   │   ├── space.json
│   │   │   ├── radius.json
│   │   │   └── index.ts
│   │   └── package.json
│   ├── primitives/
│   │   ├── src/
│   │   │   ├── Button/
│   │   │   │   ├── Button.tsx
│   │   │   │   ├── Button.types.ts
│   │   │   │   ├── Button.meta.json
│   │   │   │   └── Button.stories.tsx
│   │   │   ├── Card/
│   │   │   │   ├── Card.tsx
│   │   │   │   ├── Card.meta.json
│   │   │   │   └── ...
│   │   │   └── index.ts
│   │   └── package.json
│   ├── meta/
│   │   ├── schema.ts          # Zod schema for ComponentMeta
│   │   ├── index.ts           # exports all parsed metadata
│   │   └── package.json
│   └── ai/
│       ├── index-components.ts # build script: write embeddings to vector db
│       ├── retrieve.ts         # query-time retrieval
│       ├── render-layout.ts    # safe registry render
│       └── package.json
├── apps/
│   ├── docs/                  # Storybook + MDX docs
│   └── playground/            # prompt-to-UI sandbox
├── scripts/
│   └── validate-meta.ts       # CI: walk packages, validate all .meta.json
└── turbo.json

Three CI jobs are non-negotiable once this exists:

Validate metadata. Every .meta.json parses against the Zod schema.
Sync check. Component prop types match what the metadata declares.
Embedding regen. When metadata changes, regenerate embeddings and push to the vector store.

Accessibility, Built In

A model will happily emit a <button> with no accessible label if you let it. The metadata layer is the place to encode the rules so the model doesn't get the chance.

The pattern: encode a11y constraints as part of the prop schema and as a11y.rules in metadata. A linter consumes both.

{
  "name": "IconButton",
  "props": {
    "icon": { "type": "node", "required": true },
    "ariaLabel": { "type": "string", "required": true }
  },
  "a11y": {
    "role": "button",
    "rules": [
      "ariaLabel is required. Icon-only buttons must have an accessible name."
    ]
  }
}

When the layout JSON omits ariaLabel, schema validation fails before the component renders. The model retries, this time with a label.

You can layer eslint rules on top (jsx-a11y) to catch issues that slip past the JSON layer. A useful invariant: every component that takes an icon and no children must require aria-label.

This is one place where AI-friendly and human-friendly diverge slightly. The metadata layer can be strict in ways that would feel pedantic in hand-written code. Lean into the strictness for the model surface. Keep the human ergonomics in the React API itself.

Benchmarks: With vs Without Metadata

A few numbers from running the same set of "build me a UI for X" prompts through the same model, with and without structured component metadata indexed in a vector store. Numbers are illustrative, based on internal experiments and patterns reported by teams running v0-style workflows.

Metric	Bare library	With metadata + RAG
Prop-name accuracy (model picks valid prop names)	42%	91%
Prop-value accuracy (model picks valid enum values)	38%	88%
Layout renders without validation errors	28%	76%
Tokens per request (smaller prompt)	6,800	1,900
Wall-clock latency (median)	4.2s	1.8s
Hallucinated components (component doesn't exist in library)	12%	0%

The biggest gains are on accuracy and token cost. Validation errors drop because the model is working from real examples, not pattern-matching against generic component libraries it saw in training. Hallucinated components disappear entirely because retrieval grounds the output in your actual library.

Your mileage varies with model, library size, and prompt style. The shape of the improvement is consistent in every case I've measured.

Production Tradeoffs

A few honest costs.

Metadata maintenance. Every component now has two surfaces: the code and the metadata. Drift between them is a real risk. CI checks help, but you'll still spend time keeping them aligned. The win is large enough to be worth it. The cost is real.

Vector store ops. You now have an embedding pipeline and a vector DB. Both are new operational surfaces. Chroma running locally is fine for a small library. At scale, you'll want a managed vector store (Pinecone, Weaviate, pgvector on Postgres) with versioning and cache invalidation.

Model dependency. Your generation quality is tied to whichever model you use. A model upgrade can shift behaviour subtly. Lock the model version per environment, evaluate before bumping.

Latency budget. Retrieval plus generation lands around 1 to 3 seconds for small prompts. Not a problem for "generate this screen" flows. A problem for "autocomplete my JSX" flows. Match the architecture to the UX cadence.

The frontier moves. Today's best practice (RAG with embeddings) may give way to fine-tuned model wrappers or live tool-calling within a year. Architect the metadata layer so it can feed whatever the next paradigm is. The metadata itself is durable. The pipeline around it isn't.

Should You Build This Yet?

Honestly? Depends on your library size and ambition.

One or two apps, ten components. Skip RAG. A well-typed component library with intent-named props is enough.
A real design system, dozens of components. Add metadata. Wire CI validation. Build a small playground. Skip RAG until you have an actual AI feature in product.
A platform, hundreds of components, multiple apps consuming. Build all five pillars. Index with RAG. This is where the payoff is biggest.

The metadata layer alone is worth doing even if no model ever queries it. Better docs, better Storybook, better TypeScript, better onboarding. Every win for AI is a win for humans too. That's usually a sign the architecture is right.

What This Doesn't Solve

Two things worth naming.

It doesn't make ugly UIs beautiful. A model with great component metadata still composes by lowest-common-denominator. Visual judgement is something humans bring to the loop.

It doesn't replace design review. You still need a designer to look at what the model produced and say "yes, but make the hierarchy stronger." The metadata layer makes the model's components correct. It doesn't make the output good.

The right framing: AI-friendly design systems make the plumbing reliable. The taste still comes from your design team.

Wrapping Up

The shift from human-only to human-plus-model consumers of design systems is happening whether your library is ready or not. Tools like v0, Builder.io, and the next generation of AI IDEs are going to query your components one way or another. The question is whether they're querying a catalogue you built deliberately, or guessing at a pile of boolean props.

Five things make a design system AI-friendly:

Semantic component APIs. Enums and intent, not booleans and styling.
Structured design tokens. A graph from raw values to semantic roles.
Component metadata. Machine-readable descriptors validated in CI.
Retrieval pipelines. Embeddings plus vector search to find the right component for a prompt.
Prompt-to-component workflows. End-to-end glue with schema validation between model and render.

Build them in that order. The first three are non-negotiable and useful even without AI. The last two are where the leverage kicks in once you do.

The design system that wins the next decade is the one humans love to use and models can use without guessing. Both audiences want the same thing in the end: a library that's clear about what it is.

#reactjs #designsystems #ai #llm #frontend

AI-Friendly Design Systems: Preparing React Component Libraries for the Age of LLMs

What Changes When Models Consume Components

Who's Already Doing This?

Think of It Like a Library Catalogue

The Five Pillars

Pillar 1: Semantic Component APIs

Pillar 2: Structured Design Tokens

Pillar 3: Component Metadata Architecture

Pillar 4: RAG Pipelines for Component Retrieval

Pillar 5: Prompt-to-Component Workflows

A Demo Repo Layout

Accessibility, Built In

Benchmarks: With vs Without Metadata

Production Tradeoffs

Should You Build This Yet?

What This Doesn't Solve

Wrapping Up

Comments

More from this blog

Removed

Removed

smoke test

How I Built an AI Coding Agent for My Design System (in 589 Lines)

How I Built a Figma-Like Canvas Editor in React (and What I Learned)

Command Palette

What Changes When Models Consume Components

Who's Already Doing This?

Think of It Like a Library Catalogue

The Five Pillars

Pillar 1: Semantic Component APIs

Pillar 2: Structured Design Tokens

Pillar 3: Component Metadata Architecture

Pillar 4: RAG Pipelines for Component Retrieval

Pillar 5: Prompt-to-Component Workflows

A Demo Repo Layout

Accessibility, Built In

Benchmarks: With vs Without Metadata

Production Tradeoffs

Should You Build This Yet?

What This Doesn't Solve

Wrapping Up

Comments

More from this blog