All tools

AI Prompt Chunker — Split Long Prompts for ChatGPT, Claude & Gemini

Chunk by words/characters with overlap + consolidation instructions. 100% browser-based, no signup.

100% in-browserNo signupFree forever
Prompt templates— click to load
Long text samples:
1002000
0%50%

Preserves context between chunks

Your prompt

0 words·0 chars

How chunking works

  • Paste your long prompt — set chunk size and overlap above
  • Click Split into Chunks to generate pieces
  • First chunk tells the AI to wait; last chunk triggers the final response
  • Copy individual chunks or download as .txt / .md / JSON

What Is a Prompt Chunker?

A prompt chunker splits large text or documents into smaller pieces that fit inside an AI model's context window — the maximum number of tokens a model can process at once. Every LLM has a hard limit: GPT-4 caps at 8k–32k tokens, GPT-4o at 128k, Claude 3.5 Sonnet and Claude 3 Opus at 200k, and Gemini 1.5 Pro at 1M. When your input exceeds that limit the model silently truncates content or returns an error — and you lose critical context.

Chunking matters because many real-world tasks involve long inputs: entire codebases, research papers, legal documents, or multi-turn conversation histories. By splitting text into overlapping chunks and sending them sequentially with consolidation instructions, you work within any model's token budget while preserving the full context of your document.

How it works

Split Prompts in Seconds

01

Paste Your Text

Paste any long document, prompt, or codebase into the input area. The tool shows a live token count as you type.

02

Set Chunk Size & Overlap

Choose chunk size (words or characters) and overlap percentage (30-50% for narrative text, 10-20% for code blocks).

03

See Chunks With Token Counts

Instantly view each numbered chunk with its token count, so you know it fits your target model's context window.

04

Copy Each Chunk

Copy individual chunks or download all at once. Send them to the AI in order — consolidation instructions are added automatically.

Context Window Limits by Model

Use this reference to choose the right chunk size for your target model. Always leave 20-30% headroom for system prompts, overlap, and the model's response.

ModelContext Window≈ WordsSafe Chunk Size
GPT-4o128k tokens~96,000 words80k tokens
GPT-48k / 32k tokens~6,000 / ~24,000 words6k / 24k tokens
Claude 3.5 Sonnet200k tokens~150,000 words150k tokens
Claude 3 Opus200k tokens~150,000 words150k tokens
Gemini 1.5 Pro1M tokens~750,000 words800k tokens
Llama 38k / 128k tokens~6,000 / ~96,000 words6k / 90k tokens
Use cases

When Developers Chunk Prompts

📄

Long Document Analysis

Send entire PDFs, research papers, or legal documents to an LLM for summarization or Q&A without truncation.

💻

Codebase Summarization

Chunk large source files or multi-file repos to ask an AI to explain architecture, find bugs, or write tests.

📚

Book & Article Processing

Process full books or long-form articles for translation, rewriting, or extracting structured data.

⚙️

Batch API Calls

Pre-split large inputs into model-safe chunks before sending to the OpenAI or Anthropic API to avoid 413 errors.

🔍

RAG Preparation

Generate evenly-sized, overlapping chunks for embedding in vector databases as part of a RAG pipeline.

💬

Multi-Turn Conversations

Compress long conversation histories into chunks and re-inject them as context when the window fills up.

FAQ

Frequently Asked Questions

1What is a token in AI models?
A token is the smallest unit of text an LLM processes. Tokens are not the same as words — they are sub-word pieces determined by the model's tokenizer. In English, 1 token ≈ 0.75 words on average. Common words like "the" are one token; rare or long words may be split into 2–4 tokens. Punctuation and whitespace also consume tokens.
2How many tokens are in a word?
On average, 1,000 tokens ≈ 750 words in English. So 1 word ≈ 1.33 tokens. This varies by language — non-Latin scripts like Chinese or Arabic are often less efficient and use more tokens per word. Use the live token counter in the tool to get an exact count for your specific text.
3What is overlap and why does it help?
Overlap repeats a portion of the previous chunk at the start of the next chunk, preserving context at the boundary. Without overlap, the AI loses continuity between chunks — it might not know that a sentence interrupted at the end of chunk 2 continues in chunk 3. Use 30–50% overlap for narrative or research text, and 10–20% for self-contained sections like code blocks or lists.
4What chunk size should I use?
As a rule, leave 20–30% of the context window for system prompts, overlap, and the model's response. For GPT-4 (8k), target chunks under 6,000 tokens. For GPT-4o (128k), 80,000 tokens is a safe ceiling. For Claude models (200k), chunks of 150,000 tokens work well. Always verify with the live token counter before sending.
5What is the difference between tiktoken and cl100k_base?
tiktoken is OpenAI's open-source tokenizer library. cl100k_base is the specific encoding used by GPT-4, GPT-4o, and GPT-3.5-turbo. It defines exactly how text is split into tokens. Claude and Gemini use different tokenizers, so the same text may have a slightly different token count on each model. The tool provides an approximate count — always add headroom when targeting a specific model.
Learn more

Developer Guides

Share Your Feedback

Tell us what's working, what's broken, or what you wish we built next — it directly shapes our roadmap.

You make the difference

Good feedback is gold — a rough edge you hit today could be smoother for everyone tomorrow.

  • Feature ideas often jump the queue when lots of you ask.
  • Bug reports with steps get fixed faster — paste URLs or examples if you can.
  • Name and email are optional; we won't use them for anything except replying if needed.

Stay Updated

Get the latest tool updates, new features, and developer tips delivered to your inbox.

What you'll get
  • Product updates & new tools
  • JSON, API & developer tips
  • Unsubscribe anytime — no hassle

Get in touch

Feature ideas, bugs, or a quick thanks — we read every message.