Chunking - Agno

Chunking divides content into smaller pieces before embedding and storing in a vector database. The strategy you choose affects search quality and retrieval accuracy.

from agno.knowledge.chunking.semantic_chunking import SemanticChunking
from agno.knowledge.reader.pdf_reader import PDFReader

reader = PDFReader(
    chunking_strategy=SemanticChunking(),
)

Why Chunking Matters

Consider processing a recipe book with different strategies:

Strategy	Result
Fixed Size (5000 chars)	May split recipes mid-instruction
Semantic	Keeps complete recipes together based on meaning
Document	Each page becomes a chunk

The right strategy returns complete, relevant results. The wrong one returns fragments.

Available Strategies

Fixed Size

Split into uniform chunks by character count

Semantic

Split at natural breakpoints based on meaning

Recursive

Split using multiple separators hierarchically

Document

Preserve document structure (sections, pages)

Markdown

Split by heading structure

CSV Row

Each row becomes a chunk

Agentic

AI determines optimal boundaries

Code

Split at function and class boundaries using AST analysis

Custom

Build your own strategy

Using with Readers

Pass a chunking strategy to any reader:

from agno.knowledge.knowledge import Knowledge
from agno.knowledge.chunking.fixed_size_chunking import FixedSizeChunking
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector

reader = PDFReader(
    chunking_strategy=FixedSizeChunking(chunk_size=3000),
)

knowledge = Knowledge(
    vector_db=PgVector(table_name="docs", db_url=db_url),
)

knowledge.insert(path="documents/", reader=reader)

Choosing a Strategy

Content Type	Recommended Strategy	Why
General text	Semantic	Maintains meaning and context
Structured docs	Document	Preserves sections and hierarchy
Markdown files	Markdown	Respects heading structure
CSV/tabular data	CSV Row	Each row is a logical unit
Source code	Code	Splits at function and class boundaries
Mixed content	Recursive	Handles multiple separator types
Need consistency	Fixed Size	Predictable chunk dimensions

Each reader has a sensible default, but you can override it based on your content and retrieval needs.

Configuration

Most strategies accept configuration options:

# Fixed size with overlap
FixedSizeChunking(
    chunk_size=5000,       # Characters per chunk
    overlap=200,           # Overlap between chunks
)

# Semantic with threshold
SemanticChunking(
    similarity_threshold=0.7,  # Lower = more splits
)

# Recursive with custom separators
RecursiveChunking(
    separators=["\n\n", "\n", ". ", " "],
    chunk_size=4000,
)

Chunk Size Guidelines

Chunk Size	Trade-off
Small (1000-3000 chars)	More precise retrieval, may lose context
Default (5000 chars)	Balanced precision and context
Large (8000+ chars)	More context, less targeted results

Smaller chunks work better for specific questions. Larger chunks work better when context matters.

Next Steps

Semantic Chunking

Split content by meaning

Fixed Size Chunking

Uniform chunk sizes

Readers

Configure readers with chunking

Search & Retrieval

How chunking affects search

Documentation Index

​Why Chunking Matters

​Available Strategies