Feeding AI Agents: Unlocking Amazon, Google, and LLM Data with Scraper API Actors
Senior Cybersecurity Analyst
Key Takeaways:
- One HTTP request per actor. The Scrapeless Scraper API turns a target — an Amazon product, a Google search, an AI answer — into a single
POSTagainst a namedscraper.*actor. No browser to drive, no parser to maintain. - Two endpoints, by actor family. Site and SERP actors (
scraper.amazon,scraper.google.search,scraper.shopeev2) usePOST /api/v1/scraper/requestand return parsed JSON whose exact shape is actor-specific. The AI-answer actors (scraper.chatgpt,scraper.gemini,scraper.copilot,scraper.grok,scraper.perplexity,scraper.aimode) usePOST /api/v2/scraper/executeand return a{ status, task_id, task_result }envelope. - One auth header everywhere. Every call carries
x-api-token: <your key>. One account key covers every actor. - Structured output, not raw HTML. Site actors return parsed structured JSON —
scraper.amazonincludes both a parsedresultand the renderedhtml, whilescraper.google.searchreturns the SERP fields at the top level — and the v2 actors return the answer body plus citations and links as JSON fields. - Async when a render is slow. Some site actors return a
taskId; submit, then pollGET /api/v1/scraper/result/{taskId}until the payload is ready. - Free to start. New Scrapeless accounts include free Scraper API credits — sign up at app.scrapeless.com.
Introduction: the actor model
A traditional scraper is three jobs glued together: get past the anti-bot layer, render the page, and parse the fields you want. The Scrapeless Scraper API collapses all three into one call. You name an actor — a pre-built extractor for a specific target — hand it an input, and get structured data back. Proxy rotation, rendering, and parsing run server-side.
The actor catalog spans three groups: e-commerce (scraper.amazon, scraper.shopeev2), search (scraper.google.search), and AI answers (scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode). This guide covers authentication, the two request shapes, a worked example from each family, the async pattern, and the errors you will actually see.
What you can do with it
- Pull structured product data — title, price, rating, availability, ASIN — from a marketplace URL in one request.
- Read a search result page as JSON instead of scraping the SERP markup yourself.
- Capture an AI answer with its citations — the exact text a model returned for a prompt, plus the sources it cited, for GEO and brand-visibility tracking.
- Run it from anywhere — it is plain HTTP, so curl, Python
requests, Nodefetch, or any language with an HTTP client works unchanged.
Why the Scraper API
- No browser, no parser to maintain. The actor renders and parses server-side; you receive fields, not a DOM to walk.
- One key, one shape per family. A single
x-api-tokenauthenticates every actor, and each family returns a consistent envelope, so a client wrapper written once is reused across targets. - Residential egress and rendering are built in. The actor handles geo-routing and JavaScript rendering; you send the input and read the result.
Get your API key on the free plan at app.scrapeless.com. The Scraper API sits alongside the Universal Scraping API and Scraping Browser in the pricing catalogue.
Prerequisites
- A Scrapeless account and API key — sign up at app.scrapeless.com.
curlfor the quick test, or Python 3.10+ / Node.js 18+ for the clients below.- Basic familiarity with HTTP and JSON.
Store your key in the environment so it never lands in code:
bash
export SCRAPELESS_API_KEY=your_api_token_here
The two request shapes
The Scraper API has two endpoints. Which one an actor uses depends on what it returns.
| Family | Endpoint | Actors | Returns |
|---|---|---|---|
| Site / SERP | POST https://api.scrapeless.com/api/v1/scraper/request |
scraper.amazon, scraper.google.search, scraper.shopeev2 |
actor-specific parsed JSON (e.g. scraper.amazon → { html, metadata, result }; scraper.google.search → organic_results, … top-level) |
| AI answers | POST https://api.scrapeless.com/api/v2/scraper/execute |
scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode |
{ status, task_id, task_result } |
Both take a JSON body of { "actor": "<name>", "input": { … } } and the header x-api-token. The input fields differ per actor (see each example).
Example 1 — Amazon product (v1)
bash
curl -X POST https://api.scrapeless.com/api/v1/scraper/request \
-H "Content-Type: application/json" \
-H "x-api-token: $SCRAPELESS_API_KEY" \
-d '{
"actor": "scraper.amazon",
"input": { "action": "product", "url": "https://www.amazon.com/dp/B09B8V1LZ3" }
}'
The response carries the rendered html, a metadata block, and a parsed result object. The result is what most pipelines use directly:
json
// result (abridged) — schema is normative, values from a live run
{
"asin": "B09B8V1LZ3",
"title": "Amazon Echo Dot (newest model) …",
"final_price": "$49.99",
"availability": "In Stock",
"reviews_count": "193514",
"seller_name": "Amazon.com"
}
In Python:
python
import os, requests
resp = requests.post(
"https://api.scrapeless.com/api/v1/scraper/request",
headers={"x-api-token": os.environ["SCRAPELESS_API_KEY"]},
json={"actor": "scraper.amazon",
"input": {"action": "product", "url": "https://www.amazon.com/dp/B09B8V1LZ3"}},
timeout=120,
)
resp.raise_for_status()
print(resp.json()["result"]) # parsed object; resp.json()["html"] is the full page
scraper.google.search uses the same v1 endpoint with input { "q": "web scraping" }, but its parsed SERP comes back at the top level — organic_results, search_information, pagination, related_searches — alongside metadata, with no result wrapper.
Example 2 — an AI answer (v2)
The AI-answer actors take a prompt and a country and return the model's reply with its citations:
bash
curl -X POST https://api.scrapeless.com/api/v2/scraper/execute \
-H "Content-Type: application/json" \
-H "x-api-token: $SCRAPELESS_API_KEY" \
-d '{
"actor": "scraper.chatgpt",
"input": { "prompt": "What are the best web scraping tools?", "country": "US" }
}'
Every successful call returns the same envelope:
json
// schema is normative, values from a live run
{
"status": "success",
"task_id": "…",
"task_result": {
"model": "gpt-5-5",
"result_text": "…the model's answer…",
"content_references": [ { "title": "…", "url": "https://…" } ],
"links": [ "https://…" ]
}
}
task_result holds the answer (result_text), the cited sources (content_references), and the extracted links — share-of-citation analysis is a field read, not a parse.
A few of these actors take one extra required field, which the API names in its validation message if you omit it: scraper.copilot needs "mode": "smart", scraper.grok needs "mode": "MODEL_MODE_AUTO", and scraper.perplexity accepts "web_search": true to ground the answer. scraper.gemini and scraper.aimode take just { prompt, country }.
For the full per-actor field list, see the LLM Chat Scraper docs. For a worked end-to-end build on one of these actors, the Google AI Overview scraper guide walks through the citation-level capture.
Async actors: submit, then poll
Some site actors render a heavy page and answer asynchronously. The POST returns a taskId instead of the payload:
json
{ "taskId": "ef2f7cef-…", "message": "task in progress" }
Poll the result endpoint until the task is done, then read the same payload shape:
bash
curl "https://api.scrapeless.com/api/v1/scraper/result/$TASK_ID" \
-H "x-api-token: $SCRAPELESS_API_KEY"
# while still running: { "state": "processing", "taskId": "…" }
scraper.shopeev2 follows this pattern. Submit a shopee.sg product URL ({ "url": "https://shopee.sg/<name>-i.<shopid>.<itemid>" }), then poll until the product JSON arrives. Storefronts are region-gated, so an unsupported domain returns area not supported.
What you get back
| Actor family | Top-level keys | Where the data is |
|---|---|---|
| Site / SERP (v1) | actor-specific (amazon: html, metadata, result; google: organic_results, … top-level) |
parsed structured fields; scraper.amazon also returns the full rendered html |
| AI answers (v2) | status, task_id, task_result |
task_result holds the answer text, citations, and links |
Treat absent fields as nullable — modules vary by product, query, region, and model. Read what the actor surfaces first (scraper.amazon's result, the top-level SERP fields for scraper.google.search, or task_result for the v2 actors), and fall back to scraper.amazon's html only when you need a field the actor does not parse.
FAQ
Q: How do I authenticate?
Every request carries the header x-api-token: <your key>. One account key works across all actors. Create a key on the free plan at app.scrapeless.com.
Q: Which endpoint does an actor use — v1 or v2?
Site and SERP actors (scraper.amazon, scraper.google.search, scraper.shopeev2) use /api/v1/scraper/request. The AI-answer actors (scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode) use /api/v2/scraper/execute.
Q: How do I find an actor's required input fields?
Send the request; if a field is missing, the API responds with a validation message naming it (for example, scraper.copilot reports that mode is required). The per-actor reference is in the Scrapeless API docs.
Q: Is scraping these sites legal?
These actors collect publicly visible data. Rules vary by jurisdiction and by each site's terms of service, so review the relevant ToS and consult counsel for your use case before running at scale. Never collect personal data protected under GDPR or CCPA.
Q: Do I need a proxy?
No. Residential egress and geo-routing are built into the actor — you send the input, the actor handles the network layer. Region-gated targets accept only supported storefront domains.
Q: Can I run this without an AI agent or SDK?
Yes. It is plain HTTP — curl, requests, fetch, or any HTTP client works directly. No SDK is required.
Conclusion
The Scraper API reduces a scrape to one decision and one request: pick the actor, send { actor, input } with your x-api-token, and read structured fields back. Site and SERP actors answer on /api/v1/scraper/request with parsed JSON whose shape is actor-specific; AI-answer actors answer on /api/v2/scraper/execute with a uniform { status, task_id, task_result } envelope; slow renders answer asynchronously via a taskId you poll. Write the client wrapper once and point it at whichever actor the pipeline needs.
Ready to Build Your AI-Powered Data Pipeline?
Join our community to claim a free plan and connect with developers building Scraper API pipelines: Discord · Telegram.
Sign up at app.scrapeless.com for free Scraper API credits, and point an actor at the sites, queries, or AI answers your pipeline needs.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



