Skip to main content
ScrapeGraphTools enable an Agent to extract structured data from webpages, convert content to markdown, and retrieve raw HTML content using the ScrapeGraphAI API. The toolkit provides 5 core capabilities:
  1. smartscraper: Extract structured data using natural language prompts
  2. markdownify: Convert web pages to markdown format
  3. searchscraper: Search the web and extract information
  4. crawl: Crawl websites with structured data extraction
  5. scrape: Get raw HTML content from websites (NEW!)
The scrape method is particularly useful when you need:
  • Complete HTML source code
  • Raw content for further processing
  • HTML structure analysis
  • Content that needs to be parsed differently
All methods support heavy JavaScript rendering when needed.

Prerequisites

The following examples require the scrapegraph-py library.
pip install -U scrapegraph-py
Optionally, if your ScrapeGraph configuration or specific models require an API key, set the SGAI_API_KEY environment variable:
export SGAI_API_KEY="YOUR_SGAI_API_KEY"

Example

The following agent will extract structured data from a website using the smartscraper tool:
cookbook/tools/scrapegraph_tools.py
from agno.agent import Agent
from agno.models.openai import OpenAIChat
from agno.tools.scrapegraph import ScrapeGraphTools

agent_model = OpenAIChat(id="gpt-4.1")
scrapegraph_smartscraper = ScrapeGraphTools(enable_smartscraper=True)

agent = Agent(
    tools=[scrapegraph_smartscraper], model=agent_model, markdown=True, stream=True
)

agent.print_response("""
Use smartscraper to extract the following from https://www.wired.com/category/science/:
- News articles
- Headlines
- Images
- Links
- Author
""")

Raw HTML Scraping

Get complete HTML content from websites for custom processing:
cookbook/tools/scrapegraph_tools.py
# Enable scrape method for raw HTML content
scrapegraph_scrape = ScrapeGraphTools(enable_scrape=True, enable_smartscraper=False)

scrape_agent = Agent(
    tools=[scrapegraph_scrape],
    model=agent_model,
    markdown=True,
    stream=True,
)

scrape_agent.print_response(
    "Use the scrape tool to get the complete raw HTML content from https://en.wikipedia.org/wiki/2025_FIFA_Club_World_Cup"
)

All Functions with JavaScript Rendering

Enable all ScrapeGraph functions with heavy JavaScript support:
cookbook/tools/scrapegraph_tools.py
# Enable all ScrapeGraph functions
scrapegraph_all = Agent(
    tools=[
        ScrapeGraphTools(all=True, render_heavy_js=True)
    ],  # render_heavy_js=True scrapes all JavaScript
    model=agent_model,
    markdown=True,
    stream=True,
)

scrapegraph_all.print_response("""
Use any appropriate scraping method to extract comprehensive information from https://www.wired.com/category/science/:
- News articles and headlines
- Convert to markdown if needed  
- Search for specific information
""")

Toolkit Params

ParameterTypeDefaultDescription
api_keyOptional[str]NoneScrapeGraph API key. If not provided, uses SGAI_API_KEY environment variable.
enable_smartscraperboolTrueEnable the smartscraper function for LLM-powered data extraction.
enable_markdownifyboolFalseEnable the markdownify function for webpage to markdown conversion.
enable_crawlboolFalseEnable the crawl function for website crawling and data extraction.
enable_searchscraperboolFalseEnable the searchscraper function for web search and information extraction.
enable_agentic_crawlerboolFalseEnable the agentic_crawler function for automated browser actions and AI extraction.
enable_scrapeboolFalseEnable the scrape function for retrieving raw HTML content from websites.
render_heavy_jsboolFalseEnable heavy JavaScript rendering for all scraping functions. Useful for SPAs and dynamic content.
allboolFalseEnable all available functions. When True, all enable flags are ignored.

Toolkit Functions

FunctionDescription
smartscraperExtract structured data from a webpage using LLM and natural language prompt. Parameters: url (str), prompt (str).
markdownifyConvert a webpage to markdown format. Parameters: url (str).
crawlCrawl a website and extract structured data. Parameters: url (str), prompt (str), data_schema (dict), cache_website (bool), depth (int), max_pages (int), same_domain_only (bool), batch_size (int).
searchscraperSearch the web and extract information. Parameters: user_prompt (str).
agentic_crawlerPerform automated browser actions with optional AI extraction. Parameters: url (str), steps (List[str]), use_session (bool), user_prompt (Optional[str]), output_schema (Optional[dict]), ai_extraction (bool).
scrapeGet raw HTML content from a website. Useful for complete source code retrieval and custom processing. Parameters: website_url (str), headers (Optional[dict]).

Developer Resources

I