Skip to main content
Chunking divides content into smaller pieces before embedding and storing in a vector database. The strategy you choose affects search quality and retrieval accuracy.
from agno.knowledge.chunking.semantic_chunking import SemanticChunking
from agno.knowledge.reader.pdf_reader import PDFReader

reader = PDFReader(
    chunking_strategy=SemanticChunking(),
)

Why Chunking Matters

Consider processing a recipe book with different strategies:
StrategyResult
Fixed Size (5000 chars)May split recipes mid-instruction
SemanticKeeps complete recipes together based on meaning
DocumentEach page becomes a chunk
The right strategy returns complete, relevant results. The wrong one returns fragments.

Available Strategies

Fixed Size

Split into uniform chunks by character count

Semantic

Split at natural breakpoints based on meaning

Recursive

Split using multiple separators hierarchically

Document

Preserve document structure (sections, pages)

Markdown

Split by heading structure

CSV Row

Each row becomes a chunk

Agentic

AI determines optimal boundaries

Code

Split at function and class boundaries using AST analysis

Custom

Build your own strategy

Using with Readers

Pass a chunking strategy to any reader:
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.chunking.fixed_size_chunking import FixedSizeChunking
from agno.knowledge.reader.pdf_reader import PDFReader
from agno.vectordb.pgvector import PgVector

reader = PDFReader(
    chunking_strategy=FixedSizeChunking(chunk_size=3000),
)

knowledge = Knowledge(
    vector_db=PgVector(table_name="docs", db_url=db_url),
)

knowledge.insert(path="documents/", reader=reader)

Choosing a Strategy

Content TypeRecommended StrategyWhy
General textSemanticMaintains meaning and context
Structured docsDocumentPreserves sections and hierarchy
Markdown filesMarkdownRespects heading structure
CSV/tabular dataCSV RowEach row is a logical unit
Source codeCodeSplits at function and class boundaries
Mixed contentRecursiveHandles multiple separator types
Need consistencyFixed SizePredictable chunk dimensions
Each reader has a sensible default, but you can override it based on your content and retrieval needs.

Configuration

Most strategies accept configuration options:
# Fixed size with overlap
FixedSizeChunking(
    chunk_size=5000,       # Characters per chunk
    overlap=200,           # Overlap between chunks
)

# Semantic with threshold
SemanticChunking(
    similarity_threshold=0.7,  # Lower = more splits
)

# Recursive with custom separators
RecursiveChunking(
    separators=["\n\n", "\n", ". ", " "],
    chunk_size=4000,
)

Chunk Size Guidelines

Chunk SizeTrade-off
Small (1000-3000 chars)More precise retrieval, may lose context
Default (5000 chars)Balanced precision and context
Large (8000+ chars)More context, less targeted results
Smaller chunks work better for specific questions. Larger chunks work better when context matters.

Next Steps

Semantic Chunking

Split content by meaning

Fixed Size Chunking

Uniform chunk sizes

Readers

Configure readers with chunking

Search & Retrieval

How chunking affects search