How do I split a long prompt for an AI model?

Paste your text into Prompt Chunker, set the chunk size to fit your model's context window, configure overlap (30-50% for narrative text), and click Split. Send each numbered chunk to the AI in order. The tool automatically adds instructions telling the AI to hold chunks in memory and consolidate them on the final chunk.

What is the best chunk size for GPT-4?

For GPT-4 (8k context), keep chunks under 6,000 tokens (~4,500 words) to leave room for the system prompt and response. For GPT-4o (128k context), chunks of 50,000-80,000 tokens work well. Always add 20-30% headroom for overlap, system prompts, and model responses.

All tools AI JSON Fixer JSON Prompt Shield cURL Error Fixer

AI Prompt Chunker — Split Long Prompts for ChatGPT, Claude & Gemini

Chunk by words/characters with overlap + consolidation instructions. 100% browser-based, no signup.

100% in-browserNo signupFree forever

Prompt templates— click to load

Long text samples:

Chunk Type

Chunk Size — 500 words

1002000

Overlap — 50%

0%50%

Preserves context between chunks

Your prompt

0 words·0 chars

Tip: set overlap to preserve context between chunks

How chunking works

Paste your long prompt — set chunk size and overlap above
Click Split into Chunks to generate pieces
First chunk tells the AI to wait; last chunk triggers the final response
Copy individual chunks or download as .txt / .md / JSON

Learn more about prompt engineering

AI Prompt Engineering Guide

Write effective prompts, best practices, techniques, and templates.

→

ChatGPT Real-Life Usage Guide

Practical use cases, best prompts, and how to get great results.

→

AI Tools for Developers

Cursor, GitHub Copilot, ChatGPT, and more — when and how to use each.

→

You might also need

Browse all tools

AI JSON FixerAI-powered JSON error fix JSON Prompt ShieldSanitize LLM JSON output cURL Error FixerAI diagnose cURL failures Regex TesterTest & debug regex patterns Text DiffDiff two text blocks All tools

What Is a Prompt Chunker?

A prompt chunker splits large text or documents into smaller pieces that fit inside an AI model's context window — the maximum number of tokens a model can process at once. Every LLM has a hard limit: GPT-4 caps at 8k–32k tokens, GPT-4o at 128k, Claude 3.5 Sonnet and Claude 3 Opus at 200k, and Gemini 1.5 Pro at 1M. When your input exceeds that limit the model silently truncates content or returns an error — and you lose critical context.

Chunking matters because many real-world tasks involve long inputs: entire codebases, research papers, legal documents, or multi-turn conversation histories. By splitting text into overlapping chunks and sending them sequentially with consolidation instructions, you work within any model's token budget while preserving the full context of your document.

How it works

Split Prompts in Seconds

Paste Your Text

Paste any long document, prompt, or codebase into the input area. The tool shows a live token count as you type.

Set Chunk Size & Overlap

Choose chunk size (words or characters) and overlap percentage (30-50% for narrative text, 10-20% for code blocks).

See Chunks With Token Counts

Instantly view each numbered chunk with its token count, so you know it fits your target model's context window.

Copy Each Chunk

Copy individual chunks or download all at once. Send them to the AI in order — consolidation instructions are added automatically.

Context Window Limits by Model

Use this reference to choose the right chunk size for your target model. Always leave 20-30% headroom for system prompts, overlap, and the model's response.

Model	Context Window	≈ Words	Safe Chunk Size
GPT-4o	128k tokens	~96,000 words	80k tokens
GPT-4	8k / 32k tokens	~6,000 / ~24,000 words	6k / 24k tokens
Claude 3.5 Sonnet	200k tokens	~150,000 words	150k tokens
Claude 3 Opus	200k tokens	~150,000 words	150k tokens
Gemini 1.5 Pro	1M tokens	~750,000 words	800k tokens
Llama 3	8k / 128k tokens	~6,000 / ~96,000 words	6k / 90k tokens

Use cases

When Developers Chunk Prompts

📄

Long Document Analysis

Send entire PDFs, research papers, or legal documents to an LLM for summarization or Q&A without truncation.

💻

Codebase Summarization

Chunk large source files or multi-file repos to ask an AI to explain architecture, find bugs, or write tests.

📚

Book & Article Processing

Process full books or long-form articles for translation, rewriting, or extracting structured data.

⚙️

Batch API Calls

Pre-split large inputs into model-safe chunks before sending to the OpenAI or Anthropic API to avoid 413 errors.

🔍

RAG Preparation

Generate evenly-sized, overlapping chunks for embedding in vector databases as part of a RAG pipeline.

💬

Multi-Turn Conversations

Compress long conversation histories into chunks and re-inject them as context when the window fills up.

FAQ

Frequently Asked Questions

1Why does ChatGPT cut off my long document and miss the ending?

ChatGPT has a context window limit. If your document exceeds it, the model silently truncates content from the end. The fix is to split your document into overlapping chunks and send them in sequence — Prompt Chunker handles the splitting and automatically adds consolidation instructions telling the AI to hold each chunk in memory.

2Why does my prompt use more tokens than I expected?

Token counts are often higher than expected because punctuation, whitespace, and rare words each consume extra tokens. Non-Latin scripts (Chinese, Arabic) are especially token-heavy — 1,000 tokens can be far fewer than 750 words in those languages. Code and JSON are also token-expensive. Use the live token counter to get an exact count for your text.

3What is overlap and why does it help?

Overlap repeats a portion of the previous chunk at the start of the next chunk, preserving context at the boundary. Without overlap, the AI loses continuity between chunks — it might not know that a sentence interrupted at the end of chunk 2 continues in chunk 3. Use 30–50% overlap for narrative or research text, and 10–20% for self-contained sections like code blocks or lists.

4What chunk size should I use?

As a rule, leave 20–30% of the context window for system prompts, overlap, and the model's response. For GPT-4 (8k), target chunks under 6,000 tokens. For GPT-4o (128k), 80,000 tokens is a safe ceiling. For Claude models (200k), chunks of 150,000 tokens work well. Always verify with the live token counter before sending.

5What is the difference between tiktoken and cl100k_base?

tiktoken is OpenAI's open-source tokenizer library. cl100k_base is the specific encoding used by GPT-4, GPT-4o, and GPT-3.5-turbo. It defines exactly how text is split into tokens. Claude and Gemini use different tokenizers, so the same text may have a slightly different token count on each model. The tool provides an approximate count — always add headroom when targeting a specific model.

Learn more

Developer Guides

📖AI Context Window Guide 📖LLM Prompt Engineering Tips 📖Chunking API Responses 📖Token Limits Explained

Last updated: May 2026

Share Your Feedback

Tell us what's working, what's broken, or what you wish we built next — it directly shapes our roadmap.

You make the difference

Good feedback is gold — a rough edge you hit today could be smoother for everyone tomorrow.

Feature ideas often jump the queue when lots of you ask.
Bug reports with steps get fixed faster — paste URLs or examples if you can.
Name and email are optional; we won't use them for anything except replying if needed.

Stay Updated

Get the latest tool updates, new features, and developer tips delivered to your inbox.

What you'll get

Product updates & new tools
JSON, API & developer tips
Unsubscribe anytime — no hassle

Get in touch

Feature ideas, bugs, or a quick thanks — we read every message.

support@unblockdevs.com