How to Enable Mastra AI Agents with Real-Time Web Access Ability
Lead Scraping Automation Engineer
Key Takeaways:
- Mastra agents are blind to the live web until you connect a tool — MCP is the cleanest way in. A Mastra
Agentreasons over its model and whatever tools you hand it. Point it at the Scrapeless MCP server and the agent gains live Google search, JavaScript rendering, and a full anti-detection cloud browser without writing a single scraper by hand. - One
MCPClientplusawait mcp.listTools()wires it all in. Construct anMCPClientwith ascrapelessserver entry, passawait mcp.listTools()intonew Agent({ tools }), and the agent gets a Google SERP scraper, a Trends scraper, HTML/Markdown/Screenshot helpers, and 16 browser-automation tools — all as typed TypeScript tools. - The agent searches, renders, and drives a browser from plain prompts. Call
agent.generate(...)oragent.stream(...)with a natural-language task and Mastra composes the right Scrapeless tool calls turn by turn instead of being capped at training-cutoff knowledge. - Residential proxies and anti-detection are handled cloud-side. Every request routes through the Scrapeless anti-detection cloud browser with residential proxies in 195+ countries, so the agent gets a rendered, usable response on commercial sites without any proxy or fingerprint setup on your machine.
- 21 tools across SERP, stateless scraping, and browser automation. The Scrapeless MCP server exposes
google_search,google_trends,scrape_html/scrape_markdown/scrape_screenshot, plus 16browser_*tools — namespacedscrapeless_<tool>once Mastra loads them, one surface the agent's planner draws from each turn. - stdio or HTTP-streamable transport. Spawn the server locally with
npx, or point the sameMCPClientat the streamable HTTP endpoint for serverless functions, remote workers, and CI runners. - Free to start. New Scrapeless accounts include free Scraping Browser runtime — sign up at app.scrapeless.com.
Introduction: a TypeScript agent that can finally read the live web
Mastra is an open-source TypeScript framework for building AI agents. You define an Agent with a model, a set of instructions, and the tools it can call, then drive it from your own code with agent.generate(...) or agent.stream(...). It is a clean, type-safe way to ship agents inside a Node or serverless app. What a Mastra agent cannot do on its own is see the live web. Its knowledge stops at the model's training cutoff and whatever tools you explicitly give it.
Mastra is deliberate about that boundary: an agent's reach over the world is exactly the set of tools you attach. That makes the tool layer the place to add real-time web access — and the Model Context Protocol (MCP) is the standard, framework-agnostic way to do it. Mastra ships first-class MCP support through the @mastra/mcp package, so any MCP server becomes a set of native, typed Mastra tools. Until you connect one, the agent cannot pull a current SERP, read a competitor's pricing page, check the latest changelog, or render a JavaScript-only app.
This post closes that gap by wiring the Scrapeless MCP server into Mastra. A single MCPClient gives the agent Google search, JavaScript rendering, and a full anti-detection cloud browser, all reachable through the same natural-language prompts you already pass to agent.generate(...). For the same Scrapeless surface through another MCP client, see the Pi Agent integration.
What You Can Do With It
- Live SERP research from a TypeScript agent. Ask the agent to run
google_searchfor a query and hand back the top results as JSON, so research happens inside your app instead of a separate browser tab. - Competitor and pricing snapshots. Pass a URL in the prompt and have the agent render the page and extract plan names, prices, and features into a structured record your code can persist.
- Doc and changelog lookups that feed code. Have the agent fetch a library's current docs or release notes as clean markdown and reason against the rendered text rather than a stale memory of the API.
- Market and trend checks. Use
google_trendsto pull interest signals for a topic in a target region, then seed feature copy, content plans, or experiment ideas with current evidence. - JavaScript-page extraction into a typed record. Point the agent at a single-page app; the cloud browser hydrates it and the agent parses the result into a typed object your workflow consumes.
- Multi-step browser flows. Chain
browser_goto,browser_click,browser_type, andbrowser_scrollso the agent walks pagination, expands panels, or steps through a wizard before extracting. - Screenshot capture for review. Use
scrape_screenshotorbrowser_screenshotto grab a rendered page as an image the agent can return to your app. - Search-then-read pipelines. Combine
google_searchwithscrape_markdownso the agent finds the top results, reads each one, and summarizes them in a single agent turn.
Why the Scrapeless MCP Server
The Scrapeless MCP server is a customizable, anti-detection bridge between an AI agent and the live web. For Mastra specifically, it brings:
- An anti-detection cloud browser with JavaScript rendering. Pages are hydrated in a full Scrapeless Scraping Browser before extraction, so SPAs, infinite-scroll feeds, and lazy-loaded panels become first-class targets for
browser_goto+browser_get_html. - Residential proxies in 195+ countries. Geo-bound queries return the listings a local user would see, with proxy egress handled entirely on the Scrapeless side.
- One stdio command via
npx, no scraper code. The server launches as a child process fromnpx -y scrapeless-mcp-server; there is nothing to build, host, or import into your project beyond theMCPClient. - 21 tools spanning SERP, stateless scraping, and full browser automation.
google_searchandgoogle_trendscover SERP data,scrape_html/scrape_markdown/scrape_screenshotcover one-shot page fetches, and 16browser_*tools cover stateful navigation, clicking, typing, scrolling, and screenshots. - Tools arrive as native Mastra tools. Because
@mastra/mcpadapts MCP tools into Mastra's own tool type, the Scrapeless surface shows up exactly like any tool you would hand-write — typed, namespaced, and ready for the agent's planner.
The free plan is enough to wire this up and run real prompts; compare quotas on the pricing page when you outgrow it. Get your API key on the free plan at app.scrapeless.com.
Prerequisites
- Node.js 18 or newer on the workstation — Mastra targets modern Node, and the stdio MCP server is spawned with
npx. - A Mastra project with a model provider configured. Mastra's
Agenttakes amodel; the agent loop needs a working model from whichever provider you wire in before any tool call runs. - A Scrapeless account and API key — sign up on the free plan at app.scrapeless.com and copy the key from Settings → API Key Management.
- Basic familiarity with TypeScript — the whole setup is two packages plus a small
MCPClientdefinition.
Connect Scrapeless to Mastra
The setup is five steps; each is independently verifiable.
1. Install the Mastra MCP and core packages
Add the MCP client package and the agent core to your project:
bash
npm install @mastra/mcp @mastra/core
@mastra/mcp provides the MCPClient class that connects to MCP servers, and @mastra/core provides the Agent class you attach the tools to.
2. Configure the MCPClient (stdio)
Create an MCPClient with a single scrapeless server entry. When a command is present, Mastra auto-detects the stdio transport and launches the server as a child process:
ts
// npm install @mastra/mcp @mastra/core
import { MCPClient } from "@mastra/mcp";
const mcp = new MCPClient({
id: "scrapeless-client",
servers: {
scrapeless: {
command: "npx",
args: ["-y", "scrapeless-mcp-server"],
env: { SCRAPELESS_KEY: process.env.SCRAPELESS_KEY! },
},
},
});
One detail matters here: the Scrapeless MCP server reads its key from SCRAPELESS_KEY, not SCRAPELESS_API_KEY — the Scrapeless CLI and SDK use SCRAPELESS_API_KEY, but the MCP server is the documented exception. Keep the key in your environment (export SCRAPELESS_KEY=...) and pass it through the env object rather than hardcoding it. The server source lives at github.com/scrapeless-ai/scrapeless-mcp-server.
3. Or use HTTP-streamable transport
If the host cannot reliably spawn npx — a serverless function, a remote worker, or a CI sandbox — point the same MCPClient at the Scrapeless HTTP endpoint instead of the local process. Pass a url and set the key as a header through requestInit:
ts
const mcp = new MCPClient({
servers: {
scrapeless: {
url: new URL("https://api.scrapeless.com/mcp"),
requestInit: { headers: { "x-api-token": process.env.SCRAPELESS_KEY! } },
},
},
});
When you supply a url, Mastra tries Streamable HTTP first and falls back to the legacy SSE transport automatically. The same key value works in both modes; HTTP-streamable passes it as the x-api-token header rather than the SCRAPELESS_KEY env var. Stdio is the right default on a developer workstation; HTTP-streamable is the right default anywhere a long-lived child process is awkward to keep alive.
4. Attach the tools to an Agent
Load the tools with await mcp.listTools() and pass them straight into a new Agent. With this static approach, the tools are resolved once at construction and namespaced serverName_toolName:
ts
import { Agent } from "@mastra/core/agent";
const agent = new Agent({
id: "web-agent",
name: "Web-Aware Agent",
instructions: "Use the Scrapeless tools to search and read the live web.",
model: "<your-provider-model>",
tools: await mcp.listTools(), // -> scrapeless_google_search, scrapeless_scrape_markdown, scrapeless_browser_* ...
});
Use the model your provider exposes — Mastra's Agent takes any provider model id, and no specific model or version is required for the tools to work.
For a dynamic alternative, Mastra also offers await mcp.listToolsets(), which resolves tools per call and namespaces them serverName.toolName. Pass the toolsets at call time instead of at construction:
ts
const result = await agent.stream(prompt, {
toolsets: await mcp.listToolsets(),
});
listTools() is the simplest path when one client serves the whole app; listToolsets() suits multi-tenant setups where each request may carry different credentials or server selections. Use await mcp.disconnect() to clean up when you are done with the client.
5. Verify by enumerating the 21 tools
Resolving the tools and inspecting their keys confirms the handshake completed and the server is reporting its full surface:
ts
const tools = await mcp.listTools();
console.log(Object.keys(tools));
The output lists the server's 21 tools, each namespaced scrapeless_<tool> — the Google data tools (scrapeless_google_search, scrapeless_google_trends), the one-shot page helpers (scrapeless_scrape_html, scrapeless_scrape_markdown, scrapeless_scrape_screenshot), and the cloud-browser primitives (scrapeless_browser_create, scrapeless_browser_goto, scrapeless_browser_get_html, scrapeless_browser_get_text, scrapeless_browser_click, scrapeless_browser_type, scrapeless_browser_press_key, scrapeless_browser_scroll, scrapeless_browser_scroll_to, scrapeless_browser_screenshot, scrapeless_browser_snapshot, scrapeless_browser_wait, scrapeless_browser_wait_for, scrapeless_browser_go_back, scrapeless_browser_go_forward, scrapeless_browser_close).
How you actually use this: prompt your Mastra agent
After wiring the MCP server, you get live web data by calling the agent in your code — not by hand-writing tool calls. The agent reads the tool list the Scrapeless MCP server exposes and chooses scrapeless_google_search, scrapeless_scrape_markdown, or the scrapeless_browser_* tools as needed, composing them turn by turn from the natural-language task. There is no tool JSON to author on your side. You drive a single turn with agent.generate(prompt), or stream the run token by token with agent.stream(prompt).
Prompts you can paste
| Prompt | What the agent does |
|---|---|
"Find the top Google results for vector database benchmarks 2026 and return them as JSON." |
scrapeless_google_search with q, hl, gl → typed result rows. |
"What search topics are rising for developer tools in the US right now?" |
scrapeless_google_trends. |
"Pull the page at https://example.com/docs as clean markdown." |
scrapeless_scrape_markdown. |
"Open https://pricing.example.com, it's a JavaScript app — render it and extract plan name, price, and features as JSON." |
scrapeless_browser_create → scrapeless_browser_goto → scrapeless_browser_get_html → typed extract. |
"Compare the pricing pages at https://a.example.com/pricing and https://b.example.com/pricing and tell me where they differ." |
scrapeless_browser_create → scrapeless_browser_goto (A) → scrapeless_browser_get_html → scrapeless_browser_goto (B) → scrapeless_browser_get_html → diff. |
"Take a full-page screenshot of https://example.com/landing." |
scrapeless_scrape_screenshot. |
"Grab the rendered HTML of https://example.com so the agent can read the markup." |
scrapeless_scrape_html. |
"Open https://example.com/jobs, wait for the listings to load, snapshot the page, then extract every job title and location as JSON." |
scrapeless_browser_create → scrapeless_browser_goto → scrapeless_browser_wait_for → scrapeless_browser_snapshot → typed extract → scrapeless_browser_close. |
Worked example
You write (a single call in your TypeScript app):
ts
const result = await agent.generate(
"Use the Scrapeless google_search tool to find the top results for " +
"'mastra ai github' and return the top 3 as a JSON array of {title, link}.",
);
console.log(result.text);
The agent's plan (in plain English):
- Call
scrapeless_google_searchwithq: "mastra ai github",hl: "en",gl: "us". - Receive an array of result rows and read the
position,title, andlinkfields. - Sort by
positionand keep the first three rows. - Map each row to a
{title, link}object. - Return the JSON array as the agent's answer.
What you get back (illustrative shape — the agent works from rows like these):
json
[
{ "title": "Mastra is an open-source TypeScript agent framework, ...", "link": "https://example.com/mastra" },
{ "title": "Mastra docs — building agents", "link": "https://example.com/mastra-agents" },
{ "title": "@mastra/core on npm", "link": "https://example.org/mastra-core" }
]
// Field names match the google_search row shape; values are illustrative samples.
The stateless data tools return their payload as a body prefixed with Response:\n\n; the agent unwraps that prefix before parsing the JSON, so you never see it in the answer.
Shaping prompts
| Say this | Effect |
|---|---|
| "…from Germany" / "…German results" | Routes egress through proxyCountry and sets gl=de on the search. |
| "…as markdown, skip the nav and boilerplate" | Picks scrapeless_scrape_markdown for a clean text payload instead of raw HTML. |
| "…render it first, it's a single-page app" | Forces the scrapeless_browser_* path so extraction runs against the hydrated DOM. |
| "…top 5 only" | Trims the returned array to the first five rows. |
| "…include the snippet for each result" | Keeps the snippet field in the output rows. |
| "…close the session when you're done" | Adds a final scrapeless_browser_close with the sessionId from scrapeless_browser_create. |
Everything below is the under-the-hood reference — the tool surface, the exact return shapes, and the behavior the agent handles for you.
The Scrapeless MCP tool surface
Once the client connects, Mastra sees 21 tools spanning SERP data, stateless scraping, and full anti-detection cloud browser control. The names below are shown unnamespaced for readability; in Mastra they arrive as scrapeless_<tool> via listTools().
| Tool | What it does |
|---|---|
google_search |
Runs a Google search (q, hl, gl) and returns structured organic result rows. |
google_trends |
Pulls Google Trends interest data for a query. |
scrape_html |
Fetches a URL and returns its rendered HTML. |
scrape_markdown |
Fetches a URL and returns clean Markdown for the page. |
scrape_screenshot |
Captures a screenshot of a target URL. |
browser_create |
Opens a session on the anti-detection cloud browser. |
browser_goto |
Navigates the session to a URL. |
browser_click |
Clicks an element in the live page. |
browser_type |
Types text into an input or editable field. |
browser_get_text / browser_get_html |
Reads the page's text or HTML. |
browser_screenshot |
Captures a screenshot of the live session. |
browser_snapshot |
Returns an accessibility/structure snapshot of the page. |
browser_wait / browser_wait_for |
Waits a fixed interval, or for a condition/element. |
browser_scroll / browser_scroll_to |
Scrolls the page, or to a specific element. |
browser_go_back / browser_go_forward |
Moves through session history. |
browser_press_key |
Sends a keyboard key to the page. |
browser_close |
Ends the cloud browser session. |
Get your API key on the free plan: app.scrapeless.com
What You Get Back
A google_search call returns a JSON array of organic result rows. Each row carries the same keys, so the agent can map straight to title, link, and snippet:
json
// Field names reflect the google_search tool output; values are illustrative samples.
[
{
"position": 1,
"title": "Web Scraping With TypeScript: A Complete Guide",
"link": "https://example.com/typescript-web-scraping",
"snippet": "A step-by-step guide to scraping the web with TypeScript and parsing HTML.",
"source": "example.com"
},
{
"position": 2,
"title": "Scraping Dynamic Sites",
"link": "https://example.org/dynamic-scraping",
"snippet": "How to render JavaScript pages before extracting data.",
"source": "example.org"
}
]
A few honest observations once you start running prompts:
- Stateless tools like
google_searchandscrape_markdownreturn a body prefixed withResponse:\n\nfollowed by the JSON payload; the agent unwraps that prefix automatically, so you work with the data, not the wrapper. - The
browser_*tools return plain text with noResponse:\n\nprefix. - Tool arguments are camelCase: pass
sessionId,proxyCountry, and similar fields exactly as named. proxyCountryis a request, not a guarantee — it can defer to the region configured on your account, so confirm the egress region when geo-targeting matters.- Values in tool output are content-dependent: result counts, ordering, and snippet text vary with the live query.
Conclusion: search, render, and browse from TypeScript
The whole integration reduces to one MCPClient plus natural-language prompts. With a scrapeless server entry, await mcp.listTools() handed to new Agent({ tools }), and your key in SCRAPELESS_KEY, a Mastra agent gains live Google search, JavaScript rendering, and a full anti-detection cloud browser — exactly the web layer the agent does not ship on its own. You describe the task; the agent picks the tool.
If you are wiring up other agents, the same Scrapeless MCP server drops into them too: see the Pi Agent integration and the Scrapeless MCP server overview for the full tool reference. Keep your API key in SCRAPELESS_KEY, prefer stdio transport for local development and HTTP-streamable for hosted runtimes, choose listTools() for a single static client or listToolsets() for dynamic per-call sets, and let the agent pick the tools. Full reference at docs.scrapeless.com.
Ready to Build Your AI-Powered Data Pipeline?
Join our community to claim a free plan and connect with developers building Mastra + Scrapeless MCP agents: Discord · Telegram.
Sign up at app.scrapeless.com for free Scraping Browser runtime and adapt the integration above to the SERPs, pages, and regions your team needs. Full reference at docs.scrapeless.com.
FAQ
Q: Why does a Mastra agent need an MCP server for web access at all?
Because a Mastra Agent can only call the tools you attach to it, and it has no built-in web search or browser. MCP is the standard, framework-agnostic way to add that capability, and @mastra/mcp adapts any MCP server into native Mastra tools. Connecting Scrapeless gives the agent live Google search, rendered page access, and a full anti-detection cloud browser in one move.
Q: What is the difference between listTools() and listToolsets()?
await mcp.listTools() resolves tools once and returns them namespaced serverName_toolName (for example scrapeless_google_search); you pass the result into new Agent({ tools }) at construction. await mcp.listToolsets() resolves tools per call and namespaces them serverName.toolName; you pass the result at call time via agent.stream(prompt, { toolsets }) or agent.generate(prompt, { toolsets }). Use listTools() for a single static client, and listToolsets() when each request may carry different credentials or server selections.
Q: Which environment variable holds the Scrapeless key?
SCRAPELESS_KEY. This is the documented exception — the Scrapeless CLI and SDK read SCRAPELESS_API_KEY, but the MCP server reads SCRAPELESS_KEY. In the stdio config you pass it through the env object as process.env.SCRAPELESS_KEY; in HTTP-streamable mode you send the same value as the x-api-token header.
Q: stdio vs HTTP-streamable — when should you use each?
Use stdio when the server runs locally alongside your app: Mastra launches scrapeless-mcp-server as a child process and talks to it over standard input/output. Use HTTP-streamable (a url of https://api.scrapeless.com/mcp with the x-api-token header via requestInit) when the runtime is serverless or remote and cannot keep a local child process alive. When you supply a url, Mastra tries Streamable HTTP first and falls back to legacy SSE automatically.
Q: Does this require a specific model?
No. Mastra's Agent takes any provider model id, so you use the model your provider exposes. The Scrapeless tools are model-agnostic; pick a model that handles tool calls well, and the agent composes the Scrapeless tools the same way regardless of provider.
Q: How do the tools end up typed in my project?
@mastra/mcp adapts each MCP tool into Mastra's own tool type, so the Scrapeless tools show up exactly like tools you would hand-write — namespaced and ready for the agent's planner. You attach them through tools: await mcp.listTools() and never author tool JSON yourself.
Q: Is web scraping via the agent legal?
Scraping publicly available data is generally permissible, but you are responsible for how you use it. Review each site's Terms of Service and respect robots.txt, and remember that rules around personal data and access vary by jurisdiction. When in doubt, get legal advice for your specific use case.
Q: Can you use this without Mastra?
Yes. The Scrapeless MCP server is a standard MCP server, so any MCP-compatible client can call it — or you can drive it directly over JSON-RPC (initialize, then tools/list and tools/call). Mastra is a convenience for TypeScript apps, not a requirement.
At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.



