Feeding AI Agents: Unlocking Amazon, Google, and LLM Data with Scraper API Actors

Olivia Patel

Senior Cybersecurity Analyst

08-Jun-2026

Key Takeaways:

One HTTP request per actor. The Scrapeless Scraper API turns a target — an Amazon product, a Google search, an AI answer — into a single POST against a named scraper.* actor. No browser to drive, no parser to maintain.
Two endpoints, by actor family. Site and SERP actors (scraper.amazon, scraper.google.search, scraper.shopeev2) use POST /api/v1/scraper/request and return parsed JSON whose exact shape is actor-specific. The AI-answer actors (scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode) use POST /api/v2/scraper/execute and return a { status, task_id, task_result } envelope.
One auth header everywhere. Every call carries x-api-token: <your key>. One account key covers every actor.
Structured output, not raw HTML. Site actors return parsed structured JSON — scraper.amazon includes both a parsed result and the rendered html, while scraper.google.search returns the SERP fields at the top level — and the v2 actors return the answer body plus citations and links as JSON fields.
Async when a render is slow. Some site actors return a taskId; submit, then poll GET /api/v1/scraper/result/{taskId} until the payload is ready.
Free to start. New Scrapeless accounts include free Scraper API credits — sign up at app.scrapeless.com.

Introduction: the actor model

A traditional scraper is three jobs glued together: get past the anti-bot layer, render the page, and parse the fields you want. The Scrapeless Scraper API collapses all three into one call. You name an actor — a pre-built extractor for a specific target — hand it an input, and get structured data back. Proxy rotation, rendering, and parsing run server-side.

The actor catalog spans three groups: e-commerce (scraper.amazon, scraper.shopeev2), search (scraper.google.search), and AI answers (scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode). This guide covers authentication, the two request shapes, a worked example from each family, the async pattern, and the errors you will actually see.

What you can do with it

Pull structured product data — title, price, rating, availability, ASIN — from a marketplace URL in one request.
Read a search result page as JSON instead of scraping the SERP markup yourself.
Capture an AI answer with its citations — the exact text a model returned for a prompt, plus the sources it cited, for GEO and brand-visibility tracking.
Run it from anywhere — it is plain HTTP, so curl, Python requests, Node fetch, or any language with an HTTP client works unchanged.

Why the Scraper API

No browser, no parser to maintain. The actor renders and parses server-side; you receive fields, not a DOM to walk.
One key, one shape per family. A single x-api-token authenticates every actor, and each family returns a consistent envelope, so a client wrapper written once is reused across targets.
Residential egress and rendering are built in. The actor handles geo-routing and JavaScript rendering; you send the input and read the result.

Get your API key on the free plan at app.scrapeless.com. The Scraper API sits alongside the Universal Scraping API and Scraping Browser in the pricing catalogue.

Prerequisites

A Scrapeless account and API key — sign up at app.scrapeless.com.
curl for the quick test, or Python 3.10+ / Node.js 18+ for the clients below.
Basic familiarity with HTTP and JSON.

Store your key in the environment so it never lands in code:

bash Copy

export SCRAPELESS_API_KEY=your_api_token_here

The two request shapes

The Scraper API has two endpoints. Which one an actor uses depends on what it returns.

Family	Endpoint	Actors	Returns
Site / SERP	`POST https://api.scrapeless.com/api/v1/scraper/request`	`scraper.amazon`, `scraper.google.search`, `scraper.shopeev2`	actor-specific parsed JSON (e.g. `scraper.amazon` → `{ html, metadata, result }`; `scraper.google.search` → `organic_results`, … top-level)
AI answers	`POST https://api.scrapeless.com/api/v2/scraper/execute`	`scraper.chatgpt`, `scraper.gemini`, `scraper.copilot`, `scraper.grok`, `scraper.perplexity`, `scraper.aimode`	`{ status, task_id, task_result }`

Both take a JSON body of { "actor": "<name>", "input": { … } } and the header x-api-token. The input fields differ per actor (see each example).

Example 1 — Amazon product (v1)

bash Copy

curl -X POST https://api.scrapeless.com/api/v1/scraper/request \
  -H "Content-Type: application/json" \
  -H "x-api-token: $SCRAPELESS_API_KEY" \
  -d '{
    "actor": "scraper.amazon",
    "input": { "action": "product", "url": "https://www.amazon.com/dp/B09B8V1LZ3" }
  }'

The response carries the rendered html, a metadata block, and a parsed result object. The result is what most pipelines use directly:

json Copy

// result (abridged) — schema is normative, values from a live run
{
  "asin": "B09B8V1LZ3",
  "title": "Amazon Echo Dot (newest model) …",
  "final_price": "$49.99",
  "availability": "In Stock",
  "reviews_count": "193514",
  "seller_name": "Amazon.com"
}

In Python:

python Copy

import os, requests

resp = requests.post(
    "https://api.scrapeless.com/api/v1/scraper/request",
    headers={"x-api-token": os.environ["SCRAPELESS_API_KEY"]},
    json={"actor": "scraper.amazon",
          "input": {"action": "product", "url": "https://www.amazon.com/dp/B09B8V1LZ3"}},
    timeout=120,
)
resp.raise_for_status()
print(resp.json()["result"])   # parsed object; resp.json()["html"] is the full page

scraper.google.search uses the same v1 endpoint with input { "q": "web scraping" }, but its parsed SERP comes back at the top level — organic_results, search_information, pagination, related_searches — alongside metadata, with no result wrapper.

Example 2 — an AI answer (v2)

The AI-answer actors take a prompt and a country and return the model's reply with its citations:

bash Copy

curl -X POST https://api.scrapeless.com/api/v2/scraper/execute \
  -H "Content-Type: application/json" \
  -H "x-api-token: $SCRAPELESS_API_KEY" \
  -d '{
    "actor": "scraper.chatgpt",
    "input": { "prompt": "What are the best web scraping tools?", "country": "US" }
  }'

Every successful call returns the same envelope:

json Copy

// schema is normative, values from a live run
{
  "status": "success",
  "task_id": "…",
  "task_result": {
    "model": "gpt-5-5",
    "result_text": "…the model's answer…",
    "content_references": [ { "title": "…", "url": "https://…" } ],
    "links": [ "https://…" ]
  }
}

task_result holds the answer (result_text), the cited sources (content_references), and the extracted links — share-of-citation analysis is a field read, not a parse.

A few of these actors take one extra required field, which the API names in its validation message if you omit it: scraper.copilot needs "mode": "smart", scraper.grok needs "mode": "MODEL_MODE_AUTO", and scraper.perplexity accepts "web_search": true to ground the answer. scraper.gemini and scraper.aimode take just { prompt, country }.

For the full per-actor field list, see the LLM Chat Scraper docs. For a worked end-to-end build on one of these actors, the Google AI Overview scraper guide walks through the citation-level capture.

Async actors: submit, then poll

Some site actors render a heavy page and answer asynchronously. The POST returns a taskId instead of the payload:

json Copy

{ "taskId": "ef2f7cef-…", "message": "task in progress" }

Poll the result endpoint until the task is done, then read the same payload shape:

bash Copy

curl "https://api.scrapeless.com/api/v1/scraper/result/$TASK_ID" \
  -H "x-api-token: $SCRAPELESS_API_KEY"
# while still running: { "state": "processing", "taskId": "…" }

scraper.shopeev2 follows this pattern. Submit a shopee.sg product URL ({ "url": "https://shopee.sg/<name>-i.<shopid>.<itemid>" }), then poll until the product JSON arrives. Storefronts are region-gated, so an unsupported domain returns area not supported.

What you get back

Actor family	Top-level keys	Where the data is
Site / SERP (v1)	actor-specific (amazon: `html`, `metadata`, `result`; google: `organic_results`, … top-level)	parsed structured fields; `scraper.amazon` also returns the full rendered `html`
AI answers (v2)	`status`, `task_id`, `task_result`	`task_result` holds the answer text, citations, and links

Treat absent fields as nullable — modules vary by product, query, region, and model. Read what the actor surfaces first (scraper.amazon's result, the top-level SERP fields for scraper.google.search, or task_result for the v2 actors), and fall back to scraper.amazon's html only when you need a field the actor does not parse.

FAQ

Q: How do I authenticate?

Every request carries the header x-api-token: <your key>. One account key works across all actors. Create a key on the free plan at app.scrapeless.com.

Q: Which endpoint does an actor use — v1 or v2?

Site and SERP actors (scraper.amazon, scraper.google.search, scraper.shopeev2) use /api/v1/scraper/request. The AI-answer actors (scraper.chatgpt, scraper.gemini, scraper.copilot, scraper.grok, scraper.perplexity, scraper.aimode) use /api/v2/scraper/execute.

Q: How do I find an actor's required input fields?

Send the request; if a field is missing, the API responds with a validation message naming it (for example, scraper.copilot reports that mode is required). The per-actor reference is in the Scrapeless API docs.

Q: Is scraping these sites legal?

These actors collect publicly visible data. Rules vary by jurisdiction and by each site's terms of service, so review the relevant ToS and consult counsel for your use case before running at scale. Never collect personal data protected under GDPR or CCPA.

Q: Do I need a proxy?

No. Residential egress and geo-routing are built into the actor — you send the input, the actor handles the network layer. Region-gated targets accept only supported storefront domains.

Q: Can I run this without an AI agent or SDK?

Yes. It is plain HTTP — curl, requests, fetch, or any HTTP client works directly. No SDK is required.

Conclusion

The Scraper API reduces a scrape to one decision and one request: pick the actor, send { actor, input } with your x-api-token, and read structured fields back. Site and SERP actors answer on /api/v1/scraper/request with parsed JSON whose shape is actor-specific; AI-answer actors answer on /api/v2/scraper/execute with a uniform { status, task_id, task_result } envelope; slow renders answer asynchronously via a taskId you poll. Write the client wrapper once and point it at whichever actor the pipeline needs.

Ready to Build Your AI-Powered Data Pipeline?

Join our community to claim a free plan and connect with developers building Scraper API pipelines: Discord · Telegram.

Sign up at app.scrapeless.com for free Scraper API credits, and point an actor at the sites, queries, or AI answers your pipeline needs.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Feeding AI Agents: Unlocking Amazon, Google, and LLM Data with Scraper API Actors

Key Takeaways:

Introduction: the actor model

What you can do with it

Why the Scraper API

Prerequisites

The two request shapes

Example 1 — Amazon product (v1)

Example 2 — an AI answer (v2)

Async actors: submit, then poll

What you get back

FAQ

Conclusion

Ready to Build Your AI-Powered Data Pipeline?

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector