API Docs
A REST API to turn any public URL into clean markdown or structured JSON. AI extraction runs on our own GPU— it's included on every page, with no per-call surcharge. You pay per page; AI is free.
Authentication
Every request needs your API key as a bearer token. Buy any plan on the pricing page — the key is emailed to you and shown on the success screen. No account or login required; one key draws on your page balance.
Authorization: Bearer mai_live_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Base URL: https://www.miaibot.ai/api/v1
Scrape a page
POST /v1/scrape — fetch one URL and return its content. Costs 1 page. Set markdown, metadata, and/or brand to choose what comes back.
curl -X POST https://www.miaibot.ai/api/v1/scrape \
-H "Authorization: Bearer $CRAWLER_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"markdown": true,
"metadata": true
}'Response:
{
"ok": true,
"sourceUrl": "https://example.com/",
"item": {
"url": "https://example.com/",
"status": "success",
"statusCode": 200,
"engine": "cheerio",
"metadata": { "title": "Example Domain", "language": "en", ... },
"markdown": "# Example Domain\n..."
}
}Extract structured data (managed GPU)
POST /v1/extract — pass one or more URLs plus either an extractPrompt or an extractJsonSchema, and get structured JSON back. Extraction runs on our on-prem GPU, so it costs the same as a plain page fetch: 1 page per URL, AI included. Up to 25 URLs per call.
Because it runs on our hardware, this path is asynchronous: the call returns 202 with a job id per URL, which you then poll.
curl -X POST https://www.miaibot.ai/api/v1/extract \
-H "Authorization: Bearer $CRAWLER_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"],
"extractPrompt": "Extract the title and price"
}'Response (202 Accepted):
{
"ok": true,
"mode": "async",
"engine": "thecrawler-managed-qwen",
"creditsHeld": 1,
"jobs": [
{ "url": "https://books.toscrape.com/...", "id": "job_abc123",
"poll": "/api/v1/extract/result/job_abc123" }
]
}Then poll GET /v1/extract/result/{id} until it returns 200:
curl https://www.miaibot.ai/api/v1/extract/result/job_abc123 \
-H "Authorization: Bearer $CRAWLER_KEY"
# 202 while running: { "ok": true, "status": "processing", "pending": true }
# 200 when finished:
{
"ok": true,
"status": "success",
"data": { "title": "A Light in the Attic", "price": "£51.77" }
}Bring your own LLM (synchronous)
Prefer your own model? Add llmBaseUrl + llmModel (any public OpenAI-compatible endpoint) and /v1/extract runs synchronously, returning results in the same response instead of a job to poll.
curl -X POST https://www.miaibot.ai/api/v1/extract \
-H "Authorization: Bearer $CRAWLER_KEY" \
-H "Content-Type: application/json" \
-d '{
"urls": ["https://example.com"],
"extractPrompt": "Extract the title",
"llmBaseUrl": "https://api.openai.com/v1",
"llmModel": "gpt-4o-mini"
}'Search the web
POST /v1/search — give a query and we resolve the top Google results, then scrape each one and return its content. Costs 1 page per result returned (failed scrapes are refunded). limit is 1–10 (default 5).
curl -X POST https://www.miaibot.ai/api/v1/search \
-H "Authorization: Bearer $CRAWLER_KEY" \
-H "Content-Type: application/json" \
-d '{
"query": "best open source vector database 2026",
"limit": 3
}'Response:
{
"ok": true,
"query": "best open source vector database 2026",
"resultCount": 3,
"creditsCharged": 3,
"results": [
{ "url": "https://...", "status": "success",
"metadata": { "title": "..." }, "markdown": "# ..." }
]
}Want structured data from the results instead of markdown? Run /v1/crawl with a searchQuery plus an extract schema.
Monitor a page for changes
POST /v1/monitoring — register a watchlist: we re-check the URL on a schedule, and when its content (or extracted data) changes we POST a watch.changed event to your webhookUrl. Each scheduled check costs 1 page. Add a prompt or jsonSchema to diff structured fields instead of raw content.
Watch count + minimum interval are set by your plan — Starter 5 (daily), Pro 25 (6-hourly), Scale 100 (hourly). List with GET /v1/monitoring; pause/resume/delete with /v1/monitoring/{id}.
curl -X POST https://www.miaibot.ai/api/v1/monitoring \
-H "Authorization: Bearer $CRAWLER_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://competitor.com/pricing",
"intervalSecs": 86400,
"webhookUrl": "https://your-app.com/hooks/crawler"
}'Response (201 Created):
{ "ok": true,
"watch": { "id": "w_abc123", "url": "https://competitor.com/pricing",
"intervalSecs": 86400, "active": true } }What you pay for
One page = one credit. AI extraction is included on every page — there is no separate extraction charge. Search bills 1 page per scraped result and monitoring bills 1 page per scheduled check — same page model, no add-on SKU. We do notcharge for fetch failures (DNS / timeout / 4xx / 5xx / blocked), empty pages, documents we can't parse, or requests rejected by our guards — those credits are returned to your balance. Credits don't expire while your account is active. Full terms on the Terms page.
Errors
| status | code | meaning |
|---|---|---|
| 400 | ssrf-blocked | Target URL resolves to a private/internal host |
| 401 | — | Missing or invalid API key |
| 402 | insufficient-credits | Not enough pages on your balance |
| 403 | — | Polling a job that belongs to a different key |
| 404 | job-not-found | Unknown or expired job id |
| 400 | — | Bad body: missing urls[], extractPrompt, or schema |
More
Crawl (multi-page link following), sitemap discovery, the validated extraction contracts, and the MCP server for Claude / Cursor are documented in the open-source repo: github.com/manchittlab/TheCrawler. Questions: hello@miaibot.ai.