ShareWhatsApp X LinkedIn Facebook Reddit

LLM Structured JSON Outputs in 2026: OpenAI, Anthropic & Gemini Complete Guide

Getting reliable, parseable JSON from a language model used to mean prayer and regex. In 2026 it means choosing the right structured output method and writing 10 lines of code. OpenAI's Structured Outputs, Anthropic's Tool Use, and Gemini's Response Schema have fundamentally changed AI development — but each works differently, each has hidden limitations, and picking the wrong method costs you reliability, latency, and money. This guide covers every method for every major provider with production-ready code, real tradeoffs, and a clear decision framework.

99.9%

parse reliability with strict structured outputs vs ~70% with prompt-only

fundamentally different methods: JSON mode, function calling, strict schema

typical latency overhead for constrained decoding vs free-form generation

2026

all major providers now support schema-constrained JSON — no excuses

Why Prompt-Based JSON Is Not Enough

Prompting for JSON gives you ~70% reliability — production needs 99.9%

The naive approach — “Respond only in valid JSON format” — works in demos and fails in production. LLMs trained on internet text have learned to follow instructions, but they also learned to add explanatory text, use code fences, insert comments, add trailing commas, and deviate from schemas when they “think” a different format is more helpful. Structured output modes use constrained decoding — the model is mathematically prevented from generating tokens that would violate the schema. That is the difference between ~70% and ~99.9% reliability.

Prompt-only — unpredictable, ~70% reliable

❌ Bad

// ❌ Prompt-only approach — 70% reliability in production
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [
    {
      role: 'system',
      content: 'You must respond ONLY with valid JSON. No explanation, no code fences.',
    },
    { role: 'user', content: 'Extract the name and age from: "Alice is 30 years old"' },
  ],
});

// What you hope to get: {"name":"Alice","age":30}
// What you sometimes get:
// '{"name": "Alice", "age": 30}' — fine
// '```json
{"name":"Alice","age":30}
```' — extra fences
// 'Here is the JSON: {"name":"Alice","age":30}' — extra text
// '{"name": "Alice", "age": "30"}' — age is a string
// '{"name":"Alice","age":30,"note":"extracted from the sentence"}' — extra key

Strict structured output — mathematically guaranteed

✅ Good

// ✅ Structured Outputs — 99.9% reliability, guaranteed schema match
const response = await openai.chat.completions.create({
  model: 'gpt-4o-2024-08-06',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'person_extraction',
      strict: true,   // ← constrained decoding — cannot deviate
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age:  { type: 'integer' },
        },
        required: ['name', 'age'],
        additionalProperties: false,  // ← no extra keys allowed
      },
    },
  },
  messages: [{ role: 'user', content: 'Extract: "Alice is 30 years old"' }],
});

// What you ALWAYS get: {"name":"Alice","age":30}
// No fences, no extra text, no wrong types, no extra keys — guaranteed

OpenAI — Three Methods Compared

Method 1: JSON Mode

response_format: { type: "json_object" }. Guarantees syntactically valid JSON — no trailing commas, no code fences. Does NOT enforce a specific schema. Available on gpt-4o, gpt-4-turbo, gpt-3.5-turbo-1106+. Best for: when you need valid JSON but the exact shape is flexible.

Method 2: Function Calling

Define a tool with parameters schema. Ask the model to "call the tool". The tool call arguments are always valid JSON matching your schema. Available on all GPT-4 and GPT-3.5 models. Best for: agents that trigger actions, when you want the model to decide whether to call a function.

Method 3: Structured Outputs (strict)

response_format: { type: "json_schema", json_schema: { strict: true, schema: {...} } }. Uses constrained decoding — mathematically cannot produce output that violates the schema. Available on gpt-4o-2024-08-06 and later. Best for: data extraction, classification, any case where the schema must be exactly followed.

Which to choose?

For new projects: always start with Structured Outputs (strict: true) if available on your model. For agents: use function calling. For simple JSON without a strict schema: use JSON mode. Never use prompt-only for production code — it is unreliable at scale.

javascriptOpenAI — all three methods side by side

import OpenAI from 'openai';
const openai = new OpenAI();

const PROMPT = 'Extract the person's name and age from: "Alice Chen, 30 years old, software engineer"';

// ── Method 1: JSON Mode ───────────────────────────────────────────────────
const jsonModeResponse = await openai.chat.completions.create({
  model: 'gpt-4o',
  response_format: { type: 'json_object' },  // valid JSON, any shape
  messages: [
    { role: 'system', content: 'Extract person info as JSON.' },
    { role: 'user', content: PROMPT },
  ],
});
const jsonModeResult = JSON.parse(jsonModeResponse.choices[0].message.content);
// Result: { "name": "Alice Chen", "age": 30, "occupation": "software engineer" }
// Note: model may add extra fields — no schema enforcement

// ── Method 2: Function Calling ────────────────────────────────────────────
const functionCallResponse = await openai.chat.completions.create({
  model: 'gpt-4o',
  tools: [{
    type: 'function',
    function: {
      name: 'extract_person',
      description: 'Extract person information from text',
      parameters: {
        type: 'object',
        properties: {
          name: { type: 'string', description: 'Full name' },
          age:  { type: 'integer', description: 'Age in years' },
        },
        required: ['name', 'age'],
      },
    },
  }],
  tool_choice: { type: 'function', function: { name: 'extract_person' } },
  messages: [{ role: 'user', content: PROMPT }],
});
const toolCall = functionCallResponse.choices[0].message.tool_calls?.[0];
const functionResult = JSON.parse(toolCall?.function.arguments ?? '{}');
// Result: { "name": "Alice Chen", "age": 30 }
// Schema-enforced — no extra fields from the schema

// ── Method 3: Structured Outputs (strict) — recommended ──────────────────
const structuredResponse = await openai.chat.completions.create({
  model: 'gpt-4o-2024-08-06',
  response_format: {
    type: 'json_schema',
    json_schema: {
      name: 'person_extraction',
      strict: true,
      schema: {
        type: 'object',
        properties: {
          name: { type: 'string' },
          age:  { type: 'integer' },
        },
        required: ['name', 'age'],
        additionalProperties: false,
      },
    },
  },
  messages: [{ role: 'user', content: PROMPT }],
});
const structuredResult = JSON.parse(structuredResponse.choices[0].message.content);
// Result: { "name": "Alice Chen", "age": 30 }
// Mathematically guaranteed to match the schema — cannot deviate

typescriptOpenAI — TypeScript integration with Zod schema

import OpenAI from 'openai';
import { zodResponseFormat } from 'openai/helpers/zod';
import { z } from 'zod';

const openai = new OpenAI();

// Define once — TypeScript type AND JSON Schema in one place
const PersonSchema = z.object({
  name:       z.string(),
  age:        z.number().int().min(0).max(150),
  occupation: z.string().optional(),
  skills:     z.array(z.string()).optional(),
});

type Person = z.infer<typeof PersonSchema>;

async function extractPerson(text: string): Promise<Person> {
  const response = await openai.beta.chat.completions.parse({
    model: 'gpt-4o-2024-08-06',
    messages: [{ role: 'user', content: `Extract person info from: "${text}"` }],
    response_format: zodResponseFormat(PersonSchema, 'person'),
    //                               ↑ converts Zod schema to JSON Schema automatically
  });

  // response.choices[0].message.parsed is already typed as Person
  const parsed = response.choices[0].message.parsed;
  if (!parsed) throw new Error('Structured output parsing failed');

  return parsed; // TypeScript knows this is Person — fully typed
}

// Usage:
const person = await extractPerson('Alice Chen, 30, software engineer at Google');
console.log(person.name);       // TypeScript: string ✅
console.log(person.age);        // TypeScript: number ✅
console.log(person.occupation); // TypeScript: string | undefined ✅

Anthropic Claude — Tool Use for Structured Output

Claude does not have a dedicated “JSON mode” button. Instead, the recommended pattern is tool use: define a tool with an input schema, and ask Claude to “call” it with the extracted data. Claude's tool use output is always valid JSON matching the schema you provided. This is semantically equivalent to OpenAI's function calling.

javascriptAnthropic Claude — structured output via tool use

import Anthropic from '@anthropic-ai/sdk';

const anthropic = new Anthropic();

// ── Method: Tool use with forced tool_choice ──────────────────────────────
async function extractWithClaude(text) {
  const response = await anthropic.messages.create({
    model: 'claude-opus-4-5',
    max_tokens: 1024,
    tools: [{
      name: 'extract_product',
      description: 'Extract structured product information from unstructured text',
      input_schema: {
        type: 'object',
        properties: {
          name: {
            type: 'string',
            description: 'Product name',
          },
          price: {
            type: 'number',
            description: 'Price in USD, positive number',
          },
          category: {
            type: 'string',
            enum: ['electronics', 'clothing', 'books', 'food', 'other'],
            description: 'Product category',
          },
          inStock: {
            type: 'boolean',
            description: 'Whether the product is currently in stock',
          },
          features: {
            type: 'array',
            items: { type: 'string' },
            description: 'List of key product features',
          },
        },
        required: ['name', 'price', 'category', 'inStock'],
      },
    }],
    // Force Claude to use the tool — it MUST respond with structured JSON
    tool_choice: { type: 'tool', name: 'extract_product' },
    messages: [{
      role: 'user',
      content: `Extract product information from this text: "${text}"`,
    }],
  });

  // Find the tool use block in the response
  const toolUse = response.content.find(block => block.type === 'tool_use');
  if (!toolUse || toolUse.type !== 'tool_use') {
    throw new Error('Claude did not return a tool use block');
  }

  // toolUse.input is already a parsed JavaScript object — not a string!
  // Claude's API returns tool inputs as parsed objects, not JSON strings
  return toolUse.input;
}

// Usage:
const product = await extractWithClaude(
  'The Sony WH-1000XM5 wireless headphones are on sale for $279.99, normally $399. In stock. ' +
  'Features include 30-hour battery, noise cancellation, and multipoint connection.'
);

console.log(product);
// {
//   name: "Sony WH-1000XM5",
//   price: 279.99,
//   category: "electronics",
//   inStock: true,
//   features: ["30-hour battery", "noise cancellation", "multipoint connection"]
// }

pythonAnthropic Claude — Python with Pydantic validation

import anthropic
from pydantic import BaseModel, Field
from typing import Optional
import json

client = anthropic.Anthropic()

# Define output schema with Pydantic — Python-native validation
class Product(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(gt=0, description="Price in USD")
    category: str = Field(description="Product category")
    in_stock: bool = Field(description="Availability status")
    features: list[str] = Field(default=[], description="Key features")
    discount_percent: Optional[float] = Field(None, ge=0, le=100)

def extract_product(text: str) -> Product:
    response = client.messages.create(
        model="claude-opus-4-5",
        max_tokens=1024,
        tools=[{
            "name": "extract_product",
            "description": "Extract structured product data from text",
            "input_schema": Product.model_json_schema(),  # Pydantic → JSON Schema
        }],
        tool_choice={"type": "tool", "name": "extract_product"},
        messages=[{"role": "user", "content": f'Extract product info: "{text}"'}],
    )

    tool_use = next(
        (block for block in response.content if block.type == "tool_use"),
        None
    )
    if not tool_use:
        raise ValueError("Claude did not return a tool use block")

    # Validate with Pydantic — catches type errors from Claude
    return Product.model_validate(tool_use.input)

product = extract_product("MacBook Pro 14-inch, $1,999, in stock. M3 Pro chip, 18GB RAM.")
print(f"{product.name}: ${product.price:.2f}")

Google Gemini — Response Schema

javascriptGemini — structured output with responseMimeType and responseSchema

import { GoogleGenerativeAI, SchemaType } from '@google/generative-ai';

const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY);

// ── Method: responseMimeType + responseSchema ─────────────────────────────
const model = genAI.getGenerativeModel({
  model: 'gemini-1.5-pro',
  generationConfig: {
    responseMimeType: 'application/json',   // force JSON output
    responseSchema: {                        // constrain to this schema
      type: SchemaType.OBJECT,
      properties: {
        name:     { type: SchemaType.STRING },
        age:      { type: SchemaType.INTEGER },
        email:    { type: SchemaType.STRING },
        skills:   { type: SchemaType.ARRAY, items: { type: SchemaType.STRING } },
        isActive: { type: SchemaType.BOOLEAN },
      },
      required: ['name', 'age', 'email', 'isActive'],
    },
  },
});

const result = await model.generateContent(
  'Extract info from: "Alice Chen, 30, alice@example.com, active. Skills: Python, JavaScript, SQL"'
);

const jsonText = result.response.text();
const data = JSON.parse(jsonText);

// Result:
// {
//   "name": "Alice Chen",
//   "age": 30,
//   "email": "alice@example.com",
//   "isActive": true,
//   "skills": ["Python", "JavaScript", "SQL"]
// }

// ── Gemini function calling (alternative approach) ─────────────────────────
const modelWithTools = genAI.getGenerativeModel({
  model: 'gemini-1.5-pro',
  tools: [{
    functionDeclarations: [{
      name: 'save_person',
      description: 'Save extracted person information to the database',
      parameters: {
        type: SchemaType.OBJECT,
        properties: {
          name:  { type: SchemaType.STRING, description: 'Full name' },
          age:   { type: SchemaType.INTEGER, description: 'Age' },
          email: { type: SchemaType.STRING, description: 'Email address' },
        },
        required: ['name', 'age', 'email'],
      },
    }],
  }],
});

Open Source LLMs — Structured Output Without an API

javascriptOllama — structured output with local LLMs

// Ollama supports structured output via format parameter
// Run: ollama serve && ollama pull llama3.1

const response = await fetch('http://localhost:11434/api/generate', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.1',
    prompt: 'Extract the name and age from: "Bob is 25 years old". Return JSON only.',
    format: {                    // ← JSON Schema constraint
      type: 'object',
      properties: {
        name: { type: 'string' },
        age:  { type: 'integer' },
      },
      required: ['name', 'age'],
    },
    stream: false,
  }),
});

const result = await response.json();
const data = JSON.parse(result.response);
// { name: "Bob", age: 25 }

// ── Outlines library (Python) — constrained decoding for any local model ──
// pip install outlines
import outlines
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

model = outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.1")
generator = outlines.generate.json(model, Person)

person = generator("Extract: 'Charlie is 28 years old'")
# Returns: Person(name='Charlie', age=28)
# Guaranteed to match the Pydantic schema — uses constrained token sampling

The Structured Output Decision Framework

Is your output schema fixed and strict? → OpenAI Structured Outputs

If you know exactly what fields and types you need, use OpenAI Structured Outputs with strict: true on gpt-4o-2024-08-06+. Combine with zodResponseFormat for TypeScript type safety. This is the most reliable method available — mathematically constrained output that cannot deviate from your schema.

Are you building an AI agent that takes actions? → Function Calling

Use function calling (OpenAI tools, Anthropic tool_use, Gemini functionDeclarations) when you want the model to decide which action to take. Function calling lets the model choose between multiple tools, call none, or call several. This is the right pattern for agents — the model has agency. For pure data extraction, strict structured outputs are more appropriate.

Are you using Claude and need schema enforcement? → Tool Use with tool_choice forced

Set tool_choice: { type: "tool", name: "your_tool_name" } to force Claude to always use your tool. Without this, Claude may respond with text instead of using the tool. The forced tool choice pattern gives you schema-enforced JSON output from Claude every time.

Are you using a local or open-source model? → Outlines or Ollama format parameter

For local models via Ollama, use the format parameter with a JSON Schema. For Python-based inference, use the Outlines library which implements constrained decoding for any HuggingFace model. Both give you the same mathematical guarantee as commercial API structured outputs.

Do you need to validate the output beyond JSON syntax? → Always add schema validation

Structured outputs guarantee the JSON shape but not semantic correctness. The model might put "thirty" for an age field typed as integer — wait, it cannot with strict mode, because integer type is enforced. But it might put 999 for age. Add business logic validation after parsing: Zod refinements, Pydantic validators, or custom checks for values that must be in specific ranges.

Complex Schema Patterns — Nested Objects, Unions, and Recursion

javascriptAdvanced JSON Schema patterns for LLM output

// ── Nested objects ────────────────────────────────────────────────────────
const addressSchema = {
  type: 'object',
  properties: {
    street: { type: 'string' },
    city:   { type: 'string' },
    country: { type: 'string' },
    postalCode: { type: 'string' },
  },
  required: ['street', 'city', 'country'],
  additionalProperties: false,
};

const personWithAddressSchema = {
  type: 'object',
  properties: {
    name:    { type: 'string' },
    address: addressSchema,   // ← nested object — works in structured outputs
  },
  required: ['name', 'address'],
  additionalProperties: false,
};

// ── Enum (classification) ─────────────────────────────────────────────────
const classificationSchema = {
  type: 'object',
  properties: {
    sentiment: {
      type: 'string',
      enum: ['positive', 'neutral', 'negative'],
      // Model MUST output one of these three values — no hallucinated options
    },
    confidence: {
      type: 'number',
      // Note: minimum/maximum are NOT supported in strict mode (OpenAI)
      // Use them with Zod validation after parsing instead
    },
  },
  required: ['sentiment', 'confidence'],
  additionalProperties: false,
};

// ── Arrays of objects ─────────────────────────────────────────────────────
const extractedEntitiesSchema = {
  type: 'object',
  properties: {
    entities: {
      type: 'array',
      items: {
        type: 'object',
        properties: {
          text:  { type: 'string' },
          label: { type: 'string', enum: ['PERSON', 'ORG', 'LOCATION', 'DATE'] },
          score: { type: 'number' },
        },
        required: ['text', 'label', 'score'],
        additionalProperties: false,
      },
    },
    totalCount: { type: 'integer' },
  },
  required: ['entities', 'totalCount'],
  additionalProperties: false,
};

// ── OpenAI strict mode limitations (as of 2026) ──────────────────────────
// ✅ Supported: object, string, number, integer, boolean, array, null, enum
// ✅ Supported: required, additionalProperties: false (required when using strict)
// ✅ Supported: nested objects and arrays
// ❌ NOT supported in strict mode: anyOf/oneOf/allOf, minimum, maximum, pattern
// ❌ NOT supported: $ref recursive schemas (use $defs instead with --strict)
// ❌ NOT supported: optional properties (must be either required OR use anyOf with null)

javascriptOptional properties — the correct pattern in strict mode

// ❌ Wrong: optional properties via not-in-required array break strict mode
const badSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    nickname: { type: 'string' }, // ← not in required — BREAKS strict mode
  },
  required: ['name'],
  additionalProperties: false,
};

// ✅ Correct: optional fields via anyOf with null
const correctSchema = {
  type: 'object',
  properties: {
    name: { type: 'string' },
    nickname: {
      anyOf: [
        { type: 'string' },
        { type: 'null' },           // ← null represents "not present"
      ],
    },
  },
  required: ['name', 'nickname'],  // ← MUST be in required, even if nullable
  additionalProperties: false,
};
// Now the model will output null when nickname is unknown:
// { "name": "Alice", "nickname": null }

// ── With Zod (handles this automatically) ────────────────────────────────
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const PersonSchema = z.object({
  name:     z.string(),
  nickname: z.string().nullable(),  // ← Zod handles optional fields correctly
  age:      z.number().int(),
});

// zodResponseFormat(PersonSchema, 'person') generates the correct anyOf schema
// automatically — you don't have to think about strict mode limitations

Latency and Cost Tradeoffs

Structured outputs add latency

Constrained decoding (strict structured outputs) adds 10–50ms of latency per request because the model must check schema validity at each token. For high-throughput pipelines with strict latency SLAs, measure this overhead. JSON mode (without strict schema) has negligible overhead.

Output tokens cost the same

Structured outputs do not reduce token usage — the model still generates the same JSON text as tokens. If your schema produces {"name":"Alice","age":30}, that is the same token cost as prompt-only JSON output. The benefit is reliability, not cost savings.

Caching the schema definition

OpenAI caches the JSON schema definition across requests. If you use the same schema repeatedly (which you should), the schema itself is not re-evaluated each time — only the prompt and output change. This reduces the latency overhead to negligible for steady-state production use.

Batch processing for high volume

For offline batch extraction (thousands of documents), use the Batch API (OpenAI, Anthropic). Batch requests are 50% cheaper and processed within 24 hours. Structured outputs work with batch APIs — same schemas, lower cost.

Always validate output even with structured outputs

Structured outputs guarantee JSON syntax and schema shape. They do not guarantee correct values — the model might hallucinate plausible-sounding data. Always validate semantic correctness with business logic: ages must be 0-150, prices must be positive, emails must match a regex, dates must be in the past, etc. Use Zod refinements or Pydantic validators for this layer after the schema constraint does its job.

🔍 AI JSON Error Explainer

When your LLM output still has JSON errors despite your best efforts — trailing commas, Python True/False/None, undefined/NaN — paste it into our free AI JSON Error Explainer. Detects all errors simultaneously with plain-English explanations and one-click auto-fix.

Fix My LLM JSON →

Frequently Asked Questions

ShareWhatsApp X LinkedIn Facebook Reddit