Chunk
A smaller unit produced by dividing a large body of data or text into manageable pieces. Chunking is used in AI token limit management, streaming delivery, and file transfer, among other applications.
A chunk is a fragment created by splitting a large body of data into pieces of a fixed size or a meaningful unit. The English word "chunk," meaning a thick piece or lump, has become a standard technical term. It carries different nuances in text processing, network communication, and AI contexts, but the core idea is always the same: breaking something large into units that are easier to handle.
In the context of AI and large language models (LLMs), chunking refers to splitting text into segments that fit within the model's context window. GPT-4's context window is 128,000 tokens and Claude's is 200,000 tokens, but lengthy documents can exceed these limits. The standard approach is to divide the document into appropriate chunks, process each one individually, and then merge the results.
The quality of chunking depends on the granularity and the choice of boundaries. Fixed-length chunking (for example, splitting every 1,000 characters) is simple to implement but can cut through the middle of a sentence or paragraph, losing context. Semantic chunking splits at structural boundaries such as paragraphs, sections, or headings, so that each chunk forms a coherent unit of meaning. Adding overlap (duplicating a portion of text between adjacent chunks) helps reduce information loss near boundaries. AI and NLP books on Amazon explore these strategies in detail.
HTTP chunked transfer encoding (Transfer-Encoding: chunked) is a mechanism for sending a response in small chunks when the total size is not known in advance. Streaming responses from services like ChatGPT use this technique: generated text is transmitted chunk by chunk in real time, allowing users to start reading before the full response is complete.
In natural language processing (NLP), chunking refers to extracting phrases such as noun phrases and verb phrases from a sequence of part-of-speech-tagged words. For instance, recognizing "the large park in Tokyo" as a single noun phrase rather than treating each word independently.
Character counting is closely related to chunk size design. In RAG (Retrieval-Augmented Generation) systems, documents are split into chunks and stored in a vector database. Chunks that are too small lose context, while chunks that are too large reduce retrieval precision. A range of 200 to 1,000 characters (or 100 to 500 tokens) is commonly recommended, though the optimal size depends on the nature and purpose of the document.