v11.0.0AI-native scraping for .NET is here

Scrape any site.
Feed your AI.

WebReaper is an AI-native web scraper for .NET. One ~12 MB binary turns any site, even bot-protected ones, into clean Markdown or structured data, climbing from HTTP to a browser to stealth to get through. Bring your own LLM when you need it. No Docker, no signup, MIT licensed.

Get started Star on GitHub

$brew install alex-on-ai/webreaper/webreaper

webreaper

❯webreaper scrape https://acme.example/pricing

HTTPqueued
Headless browserqueued
Stealth browserqueued

Works with the tools you already use

.NETOpenAIAnthropicOllamaAzure OpenAIPlaywrightRedisMongoDB

Everything a modern scraper needs

Batteries included, nothing locked in. Compose exactly the pipeline you want.

Drop on PATH, run

A single ~12 MB native binary. No Docker, no Postgres, no signup. Install and you're scraping in seconds.

AI-native by composition

Markdown by default. Add schema extraction, an LLM fallback, self-healing selectors, or an autonomous agent with one .With… call.

Bring any LLM

OpenAI, Anthropic, Ollama, Azure OpenAI, llamafile: any IChatClient via Microsoft.Extensions.AI. Never locked in.

Bot-checks handled automatically

Detects Cloudflare, DataDome, and PerimeterX and climbs from HTTP to a browser to stealth, per page and host-sticky. Blocked pages are dropped, never returned as data.

Distributed when needed

Swap the scheduler, tracker, and sink to Redis, MongoDB, SQLite, Azure Service Bus, or Cosmos. Same code, many workers.

MIT, not AGPL

Embed it in commercial software, fork it, redistribute it. No copyleft, no service-source obligations, no license tax.

AI-native

Deterministic where you can, AI where you must

Start with fast, free selectors. Reach for an LLM only when the page fights back.

Markdown by default

Any page to clean, LLM-ready Markdown

No schema, no selectors. Point WebReaper at a URL and get back tidy Markdown you can pipe straight into a prompt or a vector store.

Program.cs

using WebReaper.Builders;

var engine = await ScraperEngineBuilder
    .Crawl("https://news.ycombinator.com")
    .AsMarkdown()
    .WriteToConsole()
    .BuildAsync();

await engine.RunAsync();

Typed extraction

Structured data with compile-time schemas

Declare fields once on a POCO. A Roslyn source generator emits a static schema and a reflection-free materializer that is AOT-clean, with no runtime guessing.

Program.cs

[ScrapeSchema]
public partial class Article
{
    [ScrapeField("h1")] public string? Title { get; set; }
    [ScrapeField(".score", Type = SchemaFieldType.Integer)]
    public int Points { get; set; }
    [ScrapeField(".tag", IsList = true)]
    public List<string> Tags { get; set; } = new();
}

await ScraperEngineBuilder
    .Crawl("https://example.com/post")
    .Extract(Article.Schema)
    .BuildAsync();

Deterministic first, LLM as rescue

Self-healing extraction that costs nothing when it works

Cheap CSS selectors run first. If a field comes back empty, the LLM fills it and caches the fix. Stable pages cost zero LLM calls.

Program.cs

using WebReaper.AI;

var engine = await ScraperEngineBuilder
    .Crawl("https://example.com")
    .Extract(Article.Schema)
    .WithLlmFallback(chatClient)   // OpenAI, Anthropic, Ollama…
    .WriteToJsonFile("articles.jsonl")
    .BuildAsync();

Command line

The whole toolkit, one command away

Scrape a page, map a site, or crawl everything to JSON Lines. The CLI is Native-AOT, bot-check aware, and ships a Claude Code skill.

scrape: one page to Markdown or JSON
map: discover the URLs on a site
crawl: every on-domain page to JSON Lines
init: wire the Claude Code skill

Terminal

# One page as Markdown
webreaper scrape https://example.com

# Discover URLs on a site
webreaper map https://example.com --search /blog/ --max-urls 50

# Crawl a whole site to JSON Lines
webreaper crawl https://example.com > pages.jsonl

# Bot-protected? A plain scrape auto-climbs to a browser; --stealth starts at the top tier
webreaper scrape https://example.com --stealth

How WebReaper compares

Local-first and MIT licensed, with the AI features people reach for the cloud to get.

	WebReaper	Firecrawl	Crawl4AI	Crawlee
Single self-contained binary	Supported	Not supported	Not supported	Not supported
MIT licensed	Supported	Not supported	Supported	Supported
LLM extraction + autonomous agent	Supported	Supported	Partial	Not supported
Auto bot-check stealth	Supported	Partial	Partial	Partial
Pluggable distributed backends	Supported	Supported	Not supported	Supported
Runs natively in .NET / C#	Supported	Not supported	Not supported	Not supported

Built for real work

From LLM data pipelines to price monitoring and autonomous agents.

All use cases

LLM context pipelines

Turn whole sites into clean Markdown to feed prompts and vector stores.

Learn more

Price & change monitoring

Schedule crawls, store to a database, fire only when a page actually changes.

Learn more

Autonomous research agents

Give a goal; the agent decides which links to follow until it's met.

Learn more

Bot-protected catalogs

Scrape Cloudflare and DataDome sites with auto-escalating stealth.

Learn more

Free to run. Pay only to scale.

The open-source core does everything locally. Hosted tiers add scheduling, managed infrastructure, and a team UI.

Open Source

Free

The library, CLI, and Claude Code skill. MIT, self-hosted, forever.

Install now

Early access

Cloud

Early access

Hosted scheduled crawls, managed proxies and stealth, a team dashboard.

Join the waitlist

Enterprise

Custom

SSO, SLAs, on-prem, private satellites, and dedicated support.

Contact sales

Frequently asked questions

Is WebReaper really free?

Yes. The library, the CLI, and the Claude Code skill are MIT licensed and free forever. You only pay if you later choose the optional hosted Cloud or Enterprise tiers.

Do I have to use an LLM?

No. WebReaper is deterministic by default: CSS/XPath selectors and clean Markdown need no model. The AI features are opt-in and bring-your-own LLM, so you only pay for tokens when you ask for them.

How is it different from Firecrawl?

Firecrawl is a hosted, AGPL-licensed cloud service. WebReaper is a local-first, MIT-licensed binary and .NET library. You run it yourself, embed it in commercial code, and bring any LLM.

Can it handle JavaScript and bot protection?

Yes. Swap the HTTP transport for Playwright or raw CDP for JS rendering, and pass --auto-stealth to escalate to a stealth Chromium backend on Cloudflare, DataDome, or PerimeterX challenges.

Does it scale to large crawls?

The crawl loop is parallel by design. Swap the scheduler, visited-link tracker, and result sink to Redis, MongoDB, SQLite, Azure Service Bus, or Cosmos and run many workers against shared state.

Start scraping in 30 seconds

Install the binary, run one command, and pipe clean data into whatever comes next.

$brew install alex-on-ai/webreaper/webreaper

Read the docs View on GitHub

Scrape any site.Feed your AI.

Everything a modern scraper needs

Drop on PATH, run

AI-native by composition

Bring any LLM

Bot-checks handled automatically

Distributed when needed

MIT, not AGPL

Deterministic where you can, AI where you must

Any page to clean, LLM-ready Markdown

Structured data with compile-time schemas

Self-healing extraction that costs nothing when it works

The whole toolkit, one command away

How WebReaper compares

Built for real work

LLM context pipelines

Price & change monitoring

Autonomous research agents

Bot-protected catalogs

Free to run. Pay only to scale.

Open Source

Cloud

Enterprise

Frequently asked questions

Start scraping in 30 seconds

Scrape any site.
Feed your AI.