Skip to main content
CSV row chunking is a method of splitting CSV files based on the number of rows, rather than character count. This approach treats each row (or group of rows) as a semantic unit, preserving the integrity of individual records while enabling efficient processing of tabular data.

Code

import asyncio
from agno.agent import Agent
from agno.knowledge.chunking.row import RowChunking
from agno.knowledge.knowledge import Knowledge
from agno.knowledge.reader.csv_reader import CSVReader
from agno.vectordb.pgvector import PgVector

db_url = "postgresql+psycopg://ai:ai@localhost:5532/ai"

knowledge_base = Knowledge(
    vector_db=PgVector(table_name="imdb_movies_row_chunking", db_url=db_url),
)

asyncio.run(knowledge_base.add_content_async(
    url="https://agno-public.s3.amazonaws.com/demo_data/IMDB-Movie-Data.csv",
    reader=CSVReader(
        chunking_strategy=RowChunking(),
    ),
))  

# Initialize the Agent with the knowledge_base
agent = Agent(
    knowledge=knowledge_base,
    search_knowledge=True,
)

# Use the agent 
agent.print_response("Tell me about the movie Guardians of the Galaxy", markdown=True)

Usage

1

Create a virtual environment

Open the Terminal and create a python virtual environment.
python3 -m venv .venv
source .venv/bin/activate
2

Install libraries

pip install -U sqlalchemy psycopg pgvector agno
3

Run PgVector

docker run -d \
  -e POSTGRES_DB=ai \
  -e POSTGRES_USER=ai \
  -e POSTGRES_PASSWORD=ai \
  -e PGDATA=/var/lib/postgresql/data/pgdata \
  -v pgvolume:/var/lib/postgresql/data \
  -p 5532:5432 \
  --name pgvector \
  agno/pgvector:16
4

Run Agent

python cookbook/knowledge/chunking/csv_row_chunking.py

CSV Row Chunking Params

ParameterTypeDefaultDescription
rows_per_chunkint100The number of rows to include in each chunk.
skip_headerboolFalseWhether to skip the header row when chunking.
clean_rowsboolTrueWhether to clean and normalize row data.
include_header_in_chunksboolFalseWhether to include the header row in each chunk.
max_chunk_sizeint5000Maximum character size for each chunk (fallback limit).
I