Document objects that can be chunked, embedded, and stored in your knowledge base. Each reader handles a specific format (PDF, CSV, Markdown, etc.) and extracts text and metadata.
How Readers Work
- Parse: Read the raw content using format-specific logic
- Extract: Pull out text and metadata (page numbers, authors, etc.)
- Chunk: Split large content into smaller pieces (if enabled)
- Return: Provide a list of
Documentobjects ready for embedding
Supported Readers
| Reader | Description |
|---|---|
PDFReader | Extract text from PDF files |
DoclingReader | Process multiple formats via Docling |
TextReader | Plain text files |
MarkdownReader | Markdown files |
CSVReader | CSV files (rows become documents) |
FieldLabeledCSVReader | CSV rows as field-labeled text |
JSONReader | JSON files |
PPTXReader | PowerPoint presentations |
ArxivReader | Academic papers from arXiv |
WikipediaReader | Wikipedia articles |
YouTubeReader | YouTube transcripts |
WebsiteReader | Crawl websites recursively |
WebSearchReader | Web search results |
FirecrawlReader | Web scraping via Firecrawl API |
Using Readers with Knowledge
Pass a reader toknowledge.insert() to override automatic format detection:
Auto-Selection
Agno automatically selects the right reader based on file extension or URL:knowledge.insert(), this happens automatically.
Configuration
Chunking
Format-Specific Options
Runtime Options
Override settings when callingread():
Async Processing
All readers support async for better performance with I/O operations:Custom Chunking Strategy
Override the default chunking behavior:Error Handling
Readers return an empty list when processing fails. Check logs for debugging information:Next Steps
PDF Reader
Extract text from PDFs
Website Reader
Crawl and index websites
Chunking
Control how content is split
Vector DB
Store processed documents