ScrapeGraph

ScrapeGraphTools enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content using the ScrapeGraphAI API. The toolkit provides 5 core capabilities:

smartscraper: Extract structured data using natural language prompts
markdownify: Convert web pages to markdown format
searchscraper: Search the web and extract information
crawl: Crawl websites with structured data extraction
scrape: Get raw HTML content from websites (NEW!)

The scrape method is particularly useful when you need:

Complete HTML source code
Raw content for further processing
HTML structure analysis
Content that needs to be parsed differently

All methods support heavy JavaScript rendering when needed.

Prerequisites

The following examples require the scrapegraph-py library.

pip install -U scrapegraph-py

Optionally, if your ScrapeGraph configuration or specific models require an API key, set the SGAI_API_KEY environment variable:

export SGAI_API_KEY="YOUR_SGAI_API_KEY"

Example

The following agent will extract structured data from a website using the smartscraper tool:

cookbook/tools/scrapegraph_tools.py

from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.scrapegraph import ScrapeGraphTools

agent_model = OpenAIChat(id="gpt-4.1")
scrapegraph_smartscraper = ScrapeGraphTools(enable_smartscraper=True)

agent = Agent(
    tools=[scrapegraph_smartscraper], model=agent_model, markdown=True, stream=True
)

agent.print_response("""
Use smartscraper to extract the following from https://www.wired.com/category/science/:
- News articles
- Headlines
- Images
- Links
- Author
""")

Raw HTML Scraping

Get complete HTML content from websites for custom processing:

cookbook/tools/scrapegraph_tools.py

# Enable scrape method for raw HTML content
scrapegraph_scrape = ScrapeGraphTools(enable_scrape=True, enable_smartscraper=False)

scrape_agent = Agent(
    tools=[scrapegraph_scrape],
    model=agent_model,
    markdown=True,
    stream=True,
)

scrape_agent.print_response(
    "Use the scrape tool to get the complete raw HTML content from https://en.wikipedia.org/wiki/2025_FIFA_Club_World_Cup"
)

All Functions with JavaScript Rendering

Enable all ScrapeGraph functions with heavy JavaScript support:

cookbook/tools/scrapegraph_tools.py

# Enable all ScrapeGraph functions
scrapegraph_all = Agent(
    tools=[
        ScrapeGraphTools(all=True, render_heavy_js=True)
    ],  # render_heavy_js=True scrapes all JavaScript
    model=agent_model,
    markdown=True,
    stream=True,
)

scrapegraph_all.print_response("""
Use any appropriate scraping method to extract comprehensive information from https://www.wired.com/category/science/:
- News articles and headlines
- Convert to markdown if needed  
- Search for specific information
""")

View the Startup Analyst example

Toolkit Params

Parameter	Type	Default	Description
`api_key`	`Optional[str]`	`None`	ScrapeGraph API key. If not provided, uses SGAI_API_KEY environment variable.
`enable_smartscraper`	`bool`	`True`	Enable the smartscraper function for LLM-powered data extraction.
`enable_markdownify`	`bool`	`False`	Enable the markdownify function for webpage to markdown conversion.
`enable_crawl`	`bool`	`False`	Enable the crawl function for website crawling and data extraction.
`enable_searchscraper`	`bool`	`False`	Enable the searchscraper function for web search and information extraction.
`enable_agentic_crawler`	`bool`	`False`	Enable the agentic_crawler function for automated browser actions and AI extraction.
`enable_scrape`	`bool`	`False`	Enable the scrape function for retrieving raw HTML content from websites.
`render_heavy_js`	`bool`	`False`	Enable heavy JavaScript rendering for all scraping functions. Useful for SPAs and dynamic content.
`all`	`bool`	`False`	Enable all available functions. When True, all enable flags are ignored.

Toolkit Functions

Function	Description
`smartscraper`	Extract structured data from a webpage using LLM and natural language prompt. Parameters: url (str), prompt (str).
`markdownify`	Convert a webpage to markdown format. Parameters: url (str).
`crawl`	Crawl a website and extract structured data. Parameters: url (str), prompt (str), data_schema (dict), cache_website (bool), depth (int), max_pages (int), same_domain_only (bool), batch_size (int).
`searchscraper`	Search the web and extract information. Parameters: user_prompt (str).
`agentic_crawler`	Perform automated browser actions with optional AI extraction. Parameters: url (str), steps (List[str]), use_session (bool), user_prompt (Optional[str]), output_schema (Optional[dict]), ai_extraction (bool).
`scrape`	Get raw HTML content from a website. Useful for complete source code retrieval and custom processing. Parameters: website_url (str), headers (Optional[dict]).

Developer Resources

View Tools
View Cookbook
View Tests

Introduction

Learn

Help

ScrapeGraph

Prerequisites

Example

Raw HTML Scraping

All Functions with JavaScript Rendering

Toolkit Params

Toolkit Functions

Developer Resources

Introduction

Learn

Help

​Prerequisites

​Example

​Raw HTML Scraping

​All Functions with JavaScript Rendering

​Toolkit Params

​Toolkit Functions

​Developer Resources

Prerequisites

Example

Raw HTML Scraping

All Functions with JavaScript Rendering

Toolkit Params

Toolkit Functions

Developer Resources