Scrape any site.
Feed your AI.
WebReaper is an AI-native web scraper for .NET. One ~12 MB binary turns any site into clean Markdown or structured data, with an LLM layer when you need it. No Docker, no signup, MIT licensed.
brew install pavlovtech/webreaper/webreaperWorks with the tools you already use
Everything a modern scraper needs
Batteries included, nothing locked in. Compose exactly the pipeline you want.
Drop on PATH, run
A single ~12 MB native binary. No Docker, no Postgres, no signup. Install and you're scraping in seconds.
AI-native by composition
Markdown by default. Add schema extraction, an LLM fallback, self-healing selectors, or an autonomous agent with one .With… call.
Bring any LLM
OpenAI, Anthropic, Ollama, Azure OpenAI, llamafile: any IChatClient via Microsoft.Extensions.AI. Never locked in.
Bot-checks handled automatically
Detects Cloudflare, DataDome, and PerimeterX and climbs from HTTP to a browser to stealth, per page and host-sticky. Blocked pages are dropped, never returned as data.
Distributed when needed
Swap the scheduler, tracker, and sink to Redis, MongoDB, SQLite, Azure Service Bus, or Cosmos. Same code, many workers.
MIT, not AGPL
Embed it in commercial software, fork it, redistribute it. No copyleft, no service-source obligations, no license tax.
Deterministic where you can, AI where you must
Start with fast, free selectors. Reach for an LLM only when the page fights back.
Any page to clean, LLM-ready Markdown
No schema, no selectors. Point WebReaper at a URL and get back tidy Markdown you can pipe straight into a prompt or a vector store.
using WebReaper.Builders;
var engine = await ScraperEngineBuilder
.Crawl("https://news.ycombinator.com")
.AsMarkdown()
.WriteToConsole()
.BuildAsync();
await engine.RunAsync();Structured data with compile-time schemas
Declare fields once on a POCO. A Roslyn source generator emits a static schema and a reflection-free materializer that is AOT-clean, with no runtime guessing.
[ScrapeSchema]
public partial class Article
{
[ScrapeField("h1")] public string? Title { get; set; }
[ScrapeField(".score", Type = SchemaFieldType.Integer)]
public int Points { get; set; }
[ScrapeField(".tag", IsList = true)]
public List<string> Tags { get; set; } = new();
}
await ScraperEngineBuilder
.Crawl("https://example.com/post")
.Extract(Article.Schema)
.BuildAsync();Self-healing extraction that costs nothing when it works
Cheap CSS selectors run first. If a field comes back empty, the LLM fills it and caches the fix. Stable pages cost zero LLM calls.
using WebReaper.AI;
var engine = await ScraperEngineBuilder
.Crawl("https://example.com")
.Extract(Article.Schema)
.WithLlmFallback(chatClient) // OpenAI, Anthropic, Ollama…
.WriteToJsonFile("articles.jsonl")
.BuildAsync();The whole toolkit, one command away
Scrape a page, map a site, or crawl everything to JSON Lines. The CLI is Native-AOT, bot-check aware, and ships a Claude Code skill.
- scrape: one page to Markdown or JSON
- map: discover the URLs on a site
- crawl: every on-domain page to JSON Lines
- init: wire the Claude Code skill
# One page as Markdown
webreaper scrape https://example.com
# Discover URLs on a site
webreaper map https://example.com --search /blog/ --max-urls 50
# Crawl a whole site to JSON Lines
webreaper crawl https://example.com > pages.jsonl
# Bot-protected? A plain scrape auto-climbs to a browser; --stealth starts at the top tier
webreaper scrape https://example.com --stealthHow WebReaper compares
Local-first and MIT licensed, with the AI features people reach for the cloud to get.
| WebReaper | Firecrawl | Crawl4AI | Crawlee | |
|---|---|---|---|---|
| Single self-contained binary | Supported | Not supported | Not supported | Not supported |
| MIT licensed | Supported | Not supported | Supported | Supported |
| LLM extraction + autonomous agent | Supported | Supported | Partial | Not supported |
| Auto bot-check stealth | Supported | Partial | Partial | Partial |
| Pluggable distributed backends | Supported | Supported | Not supported | Supported |
| Runs natively in .NET / C# | Supported | Not supported | Not supported | Not supported |
Built for real work
From LLM data pipelines to price monitoring and autonomous agents.
LLM context pipelines
Turn whole sites into clean Markdown to feed prompts and vector stores.
Learn morePrice & change monitoring
Schedule crawls, store to a database, fire only when a page actually changes.
Learn moreAutonomous research agents
Give a goal; the agent decides which links to follow until it's met.
Learn moreBot-protected catalogs
Scrape Cloudflare and DataDome sites with auto-escalating stealth.
Learn moreFree to run. Pay only to scale.
The open-source core does everything locally. Hosted tiers add scheduling, managed infrastructure, and a team UI.
Cloud
Early access
Hosted scheduled crawls, managed proxies and stealth, a team dashboard.
Join the waitlistFrequently asked questions
Is WebReaper really free?
Yes. The library, the CLI, and the Claude Code skill are MIT licensed and free forever. You only pay if you later choose the optional hosted Cloud or Enterprise tiers.
Do I have to use an LLM?
No. WebReaper is deterministic by default: CSS/XPath selectors and clean Markdown need no model. The AI features are opt-in and bring-your-own LLM, so you only pay for tokens when you ask for them.
How is it different from Firecrawl?
Firecrawl is a hosted, AGPL-licensed cloud service. WebReaper is a local-first, MIT-licensed binary and .NET library. You run it yourself, embed it in commercial code, and bring any LLM.
Can it handle JavaScript and bot protection?
Yes. Swap the HTTP transport for Playwright or raw CDP for JS rendering, and pass --auto-stealth to escalate to a stealth Chromium backend on Cloudflare, DataDome, or PerimeterX challenges.
Does it scale to large crawls?
The crawl loop is parallel by design. Swap the scheduler, visited-link tracker, and result sink to Redis, MongoDB, SQLite, Azure Service Bus, or Cosmos and run many workers against shared state.
Start scraping in 30 seconds
Install the binary, run one command, and pipe clean data into whatever comes next.
brew install pavlovtech/webreaper/webreaper