🎯 A customizable, anti-detection cloud browser powered by self-developed Chromium designed for web crawlers and AI Agents.👉Try Now
Back to Blog

How to Scrape Grok Answers with the Grok Scraper API

Sophia Martinez
Sophia Martinez

Specialist in Anti-Bot Strategies

10-Jun-2026

TL;DR

  • A Grok scraper API returns xAI's answer with both of its source panels as data. One POST to the scraper.grok actor captures the full answer plus web_search_results and x_search_results — the open-web pages and the X (Twitter) posts Grok cited, as separate arrays.
  • Three inputs, one of them unusual. prompt carries the question, country pins residential egress, and a required reasoning modeMODEL_MODE_FAST, MODEL_MODE_EXPERT, or MODEL_MODE_AUTO — controls how hard Grok reasons before answering.
  • X citations are the differentiator. Grok blends live web search with X's real-time feed; capturing only the answer text throws away the half of the data that says who it credited.
  • The envelope matches the other LLM actors. { status, task_id, task_result }, one x-api-token, the same endpoint — a ChatGPT capture client extends to Grok by changing the actor name and adding the mode.
  • Run metadata comes along free. Follow-up suggestions, footnotes, token counts, and the run's conversation identifiers arrive in the same payload, ready for audit trails.
  • Free to start. New Scrapeless accounts include free trial credits — sign up at app.scrapeless.com.

Introduction: the answer engine with a social feed inside

Grok answers questions by blending two source types no other major assistant combines: live web search and posts pulled straight from X. Ask it which tool to buy, which API holds up, which brand to trust, and the response folds web pages and X posts into one cited answer. For anyone tracking how a brand shows up in AI answers, that makes Grok a distinct surface — the citations include the social conversation, not just the indexed web.

Capturing those answers by hand is the usual story: a login-gated interface, streaming output, geo-sensitive responses, and a DOM that was never meant to be parsed. And Grok adds a twist of its own — the reasoning mode changes the answer, so a capture pipeline has to control it explicitly.

The scraper.grok actor turns all of that into one HTTP request: prompt, country, and mode in; structured answer and both citation panels out. This guide covers the request shape, the response schema, a runnable Python client, and the companion actors that cover the rest of the AI-answer landscape. For the ranked view of the category, see the best LLM scrapers guide.

What You Can Do With It

  • Share-of-citation tracking across two panels. Count which domains appear in web_search_results and which accounts appear in x_search_results for a fixed prompt set over time.
  • Brand monitoring where X drives the narrative. For categories where sentiment forms on X first, Grok's citations show which posts are shaping the model's answers.
  • Reasoning-mode comparison. Capture the same prompt under FAST, EXPERT, and AUTO and measure how depth changes the answer and the sources.
  • Multi-market capture. Pin runs per country and compare what Grok tells different markets about the same question.
  • Competitive answer analysis. Track when Grok starts or stops recommending a product, and trace the change to the citations behind it.
  • Dataset building. Store prompt–answer–panel triples as clean JSON for longitudinal analysis.

Why the Scrapeless Grok Scraper

The scraper.grok actor is part of the Scrapeless LLM Chat Scraper family inside the Universal Scraping API line:

  • Both citation panels as discrete arrays. Open-web sources and X posts arrive separately — a share-of-citation report reads each panel directly, no re-parsing.
  • The reasoning mode is a first-class input. You decide how hard Grok thinks per run, which keeps a scheduled series methodologically consistent.
  • Country-pinned residential egress. Runs route through residential proxies across 195+ countries, so locale-specific answers are reproducible.
  • One contract across platforms. The same endpoint, header, and { status, task_id, task_result } envelope cover the ChatGPT, Gemini, Perplexity, and Copilot actors.

The parameter reference lives in the LLM Chat Scraper docs.

Prerequisites

  • A Scrapeless account and API key — sign up at app.scrapeless.com.
  • curl for the quick test, or Python 3.10+ for the client below.
  • Basic familiarity with HTTP and JSON.

Store your key in the environment so it never lands in code:

bash Copy
export SCRAPELESS_API_KEY=your_api_token_here

How the Grok Scraper works

  • Endpoint: POST https://api.scrapeless.com/api/v2/scraper/execute
  • Actor: scraper.grok
  • Auth header: x-api-token: $SCRAPELESS_API_KEY

Request parameters

input field required description
prompt yes the question to send to Grok
country yes two-letter country code for the run's residential egress (e.g. US; JP and TW are unavailable)
mode yes reasoning depth: MODEL_MODE_FAST, MODEL_MODE_EXPERT, or MODEL_MODE_AUTO

Quick capture with curl

bash Copy
curl -sS -X POST https://api.scrapeless.com/api/v2/scraper/execute \
  -H "Content-Type: application/json" \
  -H "x-api-token: ${SCRAPELESS_API_KEY}" \
  -d '{
    "actor": "scraper.grok",
    "input": {
      "prompt": "Which web scraping API handles JavaScript-heavy sites?",
      "country": "US",
      "mode": "MODEL_MODE_EXPERT"
    }
  }'

Response envelope

json Copy
// illustrative sample — schema from a live scraper.grok run; values abridged
{
  "status": "success",
  "task_id": "52fc9c96-…",
  "task_result": {
    "user_query": "Which web scraping API handles JavaScript-heavy sites?",
    "full_response": "For JavaScript-heavy sites, the options that hold up are…",
    "web_search_results": [
      { "title": "…", "url": "https://…", "preview": "…", "description": "…", "favicon": "…", "image": "…" }
    ],
    "x_search_results": [],
    "follow_up_suggestions": [ "…" ],
    "footnotes": [],
    "tool_usages": [ "…" ],
    "token_count": 1024,
    "user_model": "…",
    "response_id": "…",
    "conversation": { "conversation_id": "…", "title": "…", "create_time": "…" }
  }
}

Field by field:

field type what it holds
task_result.user_query string the prompt as Grok received it
task_result.full_response string Grok's complete answer text
task_result.web_search_results[] array open-web citations — title, url, preview, plus description, favicon, and image when present
task_result.x_search_results[] array the X posts Grok cited; empty when the prompt pulled no social sources
task_result.follow_up_suggestions[] array the follow-up questions Grok offers after the answer
task_result.footnotes[] array footnote entries, when the answer carries them
task_result.tool_usages[] array the tools the run invoked (search, browse)
task_result.token_count number the run's token usage
task_result.conversation object run identifiers — conversation_id, title, timestamps — useful as audit keys

Get your API key on the free plan: app.scrapeless.com

Integrating the API in Python

A complete client: send the prompt, check the envelope, and print both citation panels.

python Copy
import os
import requests

ENDPOINT = "https://api.scrapeless.com/api/v2/scraper/execute"


def ask_grok(prompt: str, country: str = "US", mode: str = "MODEL_MODE_EXPERT") -> dict:
    resp = requests.post(
        ENDPOINT,
        headers={
            "Content-Type": "application/json",
            "x-api-token": os.environ["SCRAPELESS_API_KEY"],
        },
        json={
            "actor": "scraper.grok",
            "input": {"prompt": prompt, "country": country, "mode": mode},
        },
        timeout=300,
    )
    resp.raise_for_status()
    return resp.json()


if __name__ == "__main__":
    data = ask_grok("Which web scraping API handles JavaScript-heavy sites?")
    result = data.get("task_result", {})
    web = result.get("web_search_results") or []
    x = result.get("x_search_results") or []
    print(f"status={data.get('status')} web_sources={len(web)} x_sources={len(x)}")
    for i, src in enumerate(web[:5], 1):
        print(f"  [web {i}] {src.get('title', '')[:60]} → {src.get('url', '')[:60]}")
    for i, post in enumerate(x[:5], 1):
        print(f"  [x {i}] {str(post)[:80]}")

For share-of-citation work, group web_search_results URLs by domain and x_search_results by account, and count per prompt — the two panels are independent signals and worth charting separately.

Picking the reasoning mode

The required mode is the input that has no ChatGPT equivalent, and it changes both latency and output:

  • MODEL_MODE_FAST — quickest answers; suits high-volume sweeps where breadth beats depth.
  • MODEL_MODE_EXPERT — deeper reasoning and typically richer sourcing; suits the prompts you chart over time. Allow for longer runs.
  • MODEL_MODE_AUTO — Grok chooses per prompt; convenient interactively, but a scheduled series is easier to interpret when the mode is held constant.

Whichever you pick, store it with each capture — comparing an EXPERT run against a FAST run is comparing two different processes.

Companion actors for the rest of the AI-answer landscape

The endpoint, header, and envelope stay the same across the family — only the actor name and platform-specific inputs change:

  • scraper.chatgptprompt + optional country; returns result_text with content_references citations.
  • scraper.gemini — same two-field input; returns result_text plus a citations array.
  • scraper.perplexity — required country and a web_search flag; returns web_results, media_items, and related prompts.
  • scraper.copilot — the Copilot answer surface under the same contract.
  • scraper.overview / scraper.aimode — Google's AI Overview block and AI Mode tab; covered end to end in the AI Overview guide.

Pricing for the line is usage-based with free trial credits on signup — current tiers are on the pricing page.

How to avoid common problems

  • An empty x_search_results is normal for many prompts. Technical and product questions often resolve entirely from the open web. Prompts about people, events, and sentiment are the ones that pull X posts — phrase accordingly when the X panel is the point.
  • Panel sizes swing run to run. The same prompt can cite 35 web sources one run and 20 the next. Store every capture with its conversation_id and read the series, not a single run.
  • Hold the mode constant in a series. Mode changes the reasoning process; mixing modes inside one tracked prompt set makes trend lines uninterpretable.
  • Treat fields as nullable. footnotes is often empty, web-source entries carry description/image only sometimes, and x_search_results may be [] — read what is present.
  • Mind the country list. country is required and JP/TW are unavailable; pick the markets you report on and keep them fixed per series.

Conclusion: both panels, one request

Capturing Grok reduces to one call: POST { actor: "scraper.grok", input: { prompt, country, mode } } with your x-api-token, read full_response for the answer, and chart web_search_results and x_search_results as separate citation signals. Hold the mode constant, pin the country, store the conversation_id, and the same client scales from one prompt to a scheduled multi-market monitoring program.

FAQ

Q: Is scraping Grok answers legal?

The actor captures publicly rendered answer content. Rules vary by jurisdiction and by the platform's terms of service — review the relevant ToS and consult counsel for your use case, especially before redistributing captures. Never collect personal data protected under GDPR or CCPA.

Q: How do I authenticate?

Every request carries x-api-token: <your key>. One account key covers scraper.grok and every other Scrapeless actor. Create a key on the free plan at app.scrapeless.com.

Q: Do I need a proxy?

No. Residential egress and geo-routing are built into the actor; the required country input is the whole configuration.

Q: Why is mode required?

Grok's reasoning depth materially changes the answer, so the actor makes it explicit instead of defaulting silently. In code the values are the API enums — MODEL_MODE_FAST, MODEL_MODE_EXPERT, MODEL_MODE_AUTO.

Q: How do I separate web citations from X citations?

They already arrive separated: web_search_results holds the open-web pages, x_search_results holds the X posts. Read each array directly.

Q: Can I run this without an SDK or AI agent?

Yes. It is plain HTTP — curl, Python requests, Node fetch, or any HTTP client works directly against POST /api/v2/scraper/execute.

Q: Does my ChatGPT capture code work for Grok?

The auth, endpoint, and envelope are identical. Change the actor name, add the required mode and country, and map the task_result keys (full_response instead of result_text, the two panels instead of content_references).

Ready to Build Your AI-Answer Data Pipeline?

Join our community to claim a free plan and connect with developers building AI-answer pipelines: Discord · Telegram.

Sign up at app.scrapeless.com for free trial credits, and point the scraper.grok actor at the prompts, modes, and markets your monitoring program needs.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue