Why Your Elixir Scraper Gets Blocked, And How Residential Proxies + Cloud Browser Fix It

James Thompson

Scraping and Proxy Management Expert

03-Jun-2026

Key Takeaways:

Elixir is built for concurrent scraping. The BEAM runtime weaves thousands of lightweight processes onto one node, so a crawl that fans out across hundreds of URLs runs as plain Task.async_stream instead of a thread pool you have to babysit.
Req and HTTPoison fetch, Floki parses. Req is the modern, batteries-included HTTP client; HTTPoison is the long-standing hackney-backed option; Floki turns raw HTML into a tree you query with CSS selectors. Together they cover any page that ships rendered markup.
Crawly is the full crawling framework. It schedules requests across workers, handles pagination through follow-up requests, applies middlewares for user-agent rotation and request options, and pushes parsed items down a pipeline — Scrapy-style, but on the BEAM.
Scrapeless residential proxies route the fetches. A single proxy host, port, and Basic-auth header plug straight into Req's connect_options, HTTPoison's :proxy / :proxy_auth, or Crawly's RequestOptions middleware, giving every request a residential IP and pinning egress geography.
JS-heavy and anti-bot targets escalate to the Scrapeless Scraping Browser. Elixir does not drive the Chrome DevTools Protocol as cleanly as Node or Python, so the cloud browser is reached two ways: HTTP requests through Scrapeless residential proxies for the rendered majority, and a small cloud-browser call-out for the client-side-rendered minority.
Free to start. New Scrapeless accounts include free Scraping Browser runtime — sign up at app.scrapeless.com.

Introduction: Why Elixir, and where the friction starts

Elixir runs on the BEAM, the same virtual machine that has kept Erlang telecom switches online for decades. Its defining trait for scraping is cheap concurrency: spawning ten thousand processes is routine, each one isolated, each one able to hold a single in-flight HTTP request without blocking the others. A crawl that would need an async framework and careful pool tuning in another language is a Task.async_stream with a max_concurrency cap in Elixir.

The library story is mature. Req and HTTPoison fetch pages, Floki parses them with CSS selectors, and Crawly wraps the whole loop — scheduling, deduplication, pagination, and item pipelines — into a Scrapy-style framework that still feels like idiomatic Elixir. For static catalogues, sitemaps, and server-rendered pages, that stack is complete on its own.

Two things break it. First, IP reputation: a clean datacenter address gets flagged the moment a target runs even a basic bot manager, and no amount of header tuning fixes a blocked egress IP. Second, client-side rendering: a single-page app returns HTTP 200 with an empty <div id="app">, and Floki parses exactly what arrived — nothing. The page looks full in a browser and empty to the scraper.

This guide builds the Elixir stack in tiers. The HTTP tier uses Req, HTTPoison, and Crawly routed through Scrapeless residential proxies in 195+ countries. The rendered-JS tier escalates to the Scrapeless Scraping Browser, reached from Elixir without asking the BEAM to speak CDP directly. For the residential-proxy layer these fetches route through, see What Is an SSL Proxy?.

What You Can Build

The two-tier pattern — Elixir libraries in front, Scrapeless behind the escalation — covers most jobs that defeat a plain HTTP scraper:

Concurrent catalogue crawls. Sitemaps, article archives, product listings — Task.async_stream fans out across the URL set with a bounded worker count and parses each page with Floki.
Scheduled monitoring with Crawly. Define a spider once, let it page through listings on a schedule, and push parsed items down a validation-and-storage pipeline.
Geo-specific snapshots. Pin the Scrapeless proxy country so pricing, availability, and consent walls return what a local user sees, not whatever your server's IP resolves to.
Resilient extraction behind bot managers. Route the fetch through residential egress so an ordinary home IP, not a datacenter range, makes the request.
RAG and LLM ingestion. Render publisher and documentation pages to clean text, then feed the extracted content into an embedding pipeline.
SPA and infinite-scroll pages. Escalate the client-side-rendered minority to the Scrapeless Scraping Browser, which runs the JavaScript cloud-side before you parse the result.

Why pair Elixir with Scrapeless

Elixir gives you concurrency, parsing, and a crawling framework; the Scrapeless Scraping Browser supplies the egress and rendering plumbing that a server-side HTTP client cannot. The two slot together because the handoff is a standard HTTP proxy on one side and a documented cloud-browser endpoint on the other.

Residential proxies in 195+ countries. Exposed as a single proxy host, port, and Basic-auth credential — drops straight into Req, HTTPoison, or Crawly's RequestOptions middleware.
Per-request geo pinning. A country code in the proxy username controls egress geography with no extra handshake, so the same code pulls US, GB, DE, or JP views by swapping one segment.
Anti-detection cloud browser. For client-side-rendered pages, the Scrapeless Scraping Browser runs a self-developed Chromium with full cloud-side JavaScript rendering and per-session fingerprint randomization, so SPAs and lazy-loaded panels hydrate before extraction.
One API key for both tiers. Residential proxies and Scraping Browser bill against the same Scrapeless account, so the HTTP tier and the rendered tier share one credential.
Sticky-session option. Hold the same residential IP across a multi-step traversal when a flow needs continuity, or rotate per request for everything else.

Runtime is free to start and scales with usage — see Scrapeless pricing for the tiers, and get your API key on the free plan at app.scrapeless.com.

Prerequisites

Elixir 1.16+ and Erlang/OTP 26+ — check with elixir --version.
A Scrapeless account and API key — sign up for the free plan at app.scrapeless.com, then grab your key from Settings → API Key Management.
Residential-proxy credentials — visible in the dashboard under Proxies → Residential at app.scrapeless.com.
Basic familiarity with mix, CSS selectors, and the terminal.

Install: set up the mix project and dependencies

Create a new project and add the scraping libraries. mix new scaffolds the structure; the four deps cover fetching (req, httpoison), parsing (floki), and the full crawling framework (crawly):

bash Copy

mix new elixir_scraper --sup
cd elixir_scraper

Add the dependencies to mix.exs:

elixir Copy

# mix.exs
defp deps do
  [
    {:req, "~> 0.5"},        # modern HTTP client (Finch/Mint under the hood)
    {:httpoison, "~> 2.2"},  # hackney-backed HTTP client, long-standing option
    {:floki, "~> 0.36"},     # HTML parser with CSS-selector queries
    {:crawly, "~> 0.17"}     # full crawling framework, Scrapy-style
  ]
end

Then pull them down:

bash Copy

mix deps.get

You do not need all four in a real project — req plus floki is the minimal fetch-and-parse pair. The guide shows each so you can pick the client that fits your stack.

Configure: store your Scrapeless credentials

Export your API key and residential-proxy credentials as environment variables so they stay out of source control. In the dashboard under Proxies → Residential, click Generate and the panel prints a colon-delimited connection string in the form <GATEWAY>:<PORT>:<CHANNEL_ID>-proxy-country_US-r_10m-s_<SESSION_ID>:<PASSWORD>:

bash Copy

export SCRAPELESS_API_KEY="your_api_token_here"
export SCRAPELESS_CHANNEL_ID="your_channel_id"          # printed at the start of the username
export SCRAPELESS_PROXY_PASS="your_channel_password"
export SCRAPELESS_PROXY_GATEWAY="gw-us.scrapeless.io"   # see regional gateways below

Regional gateways: gw-us.scrapeless.io (Americas), gw-eu.scrapeless.io (Europe), gw-ap.scrapeless.io (Asia-Pacific). Pick the gateway closest to your runtime to keep handshake latency low; the egress country is still controlled by the country_<CC> username segment regardless of which gateway you connect through. Port is 8789 for all.

The residential-proxy username is built from four parameters:

<CHANNEL_ID> — your channel identifier (printed at the start of the username on the dashboard).
country_<CC> — country pin as the two-letter ISO code: country_US, country_GB, country_DE, country_JP, etc. (use the code shown in the dashboard location selector).
r_<duration> — sticky-session rotation interval (e.g. r_10m holds the same IP for 10 minutes before rotating).
s_<SESSION_ID> — sticky-session identifier; reuse the same s_<id> to hold one IP across requests for the rotation window.

Drop the r_ and s_ segments for a fresh residential IP per request; keep them when a paginated traversal needs the same IP throughout.

Basic: fetch with Req through a residential proxy, parse with Floki

Req routes through an HTTP proxy with the :connect_options key, which it forwards to Finch and Mint underneath. Proxy authentication goes in :proxy_headers as a single Basic-auth header — Mint merges it into the CONNECT request. The username carries the country pin, so the proxy line itself selects the egress geography:

elixir Copy

defmodule ElixirScraper.ReqClient do
  @gateway System.get_env("SCRAPELESS_PROXY_GATEWAY") || "gw-us.scrapeless.io"
  @port 8789

  # Build the residential-proxy username with the country pin baked in.
  defp proxy_username(country) do
    channel = System.fetch_env!("SCRAPELESS_CHANNEL_ID")
    "#{channel}-proxy-country_#{country}"
  end

  defp proxy_auth_header(country) do
    user = proxy_username(country)
    pass = System.fetch_env!("SCRAPELESS_PROXY_PASS")
    "Basic " <> Base.encode64("#{user}:#{pass}")
  end

  @doc "Fetch a URL through Scrapeless residential egress in `country`."
  def fetch(url, country \\ "US") do
    Req.get(url,
      connect_options: [
        proxy: {:http, @gateway, @port, []},
        proxy_headers: [{"proxy-authorization", proxy_auth_header(country)}]
      ],
      headers: [{"user-agent", "Mozilla/5.0 (compatible; ElixirScraper/1.0)"}]
    )
  end
end

Call it and hand the body straight to Floki. Floki.parse_document/1 turns the HTML string into a tree; Floki.find/2 queries it with CSS selectors; Floki.text/1 and Floki.attribute/2 pull values out:

elixir Copy

{:ok, resp} = ElixirScraper.ReqClient.fetch("https://books.toscrape.com/")
{:ok, document} = Floki.parse_document(resp.body)

titles =
  document
  |> Floki.find("article.product_pod h3 a")
  |> Floki.attribute("title")

prices =
  document
  |> Floki.find("article.product_pod p.price_color")
  |> Floki.text()

IO.inspect(Enum.zip(titles, prices), label: "first page")

Three things this locks in early:

The proxy is configured per request, not globally. That keeps one client free to pull different countries by passing a different country argument.
The Basic-auth header is the load-bearing line. Without proxy_headers, the CONNECT tunnel to the residential gateway is rejected for missing credentials.
Floki queries the parsed tree, not the raw string. Always parse_document/1 first, then find/2 — selectors run against the tree.

Advanced 1: the HTTPoison variant

HTTPoison predates Req and remains common in existing codebases. It is backed by hackney, which exposes proxies through two request options: :proxy as a {host, port} tuple and :proxy_auth as a {user, password} tuple. No manual Base64 — hackney builds the header:

elixir Copy

defmodule ElixirScraper.HTTPoisonClient do
  @gateway System.get_env("SCRAPELESS_PROXY_GATEWAY") || "gw-us.scrapeless.io"
  @port 8789

  defp proxy_username(country) do
    channel = System.fetch_env!("SCRAPELESS_CHANNEL_ID")
    "#{channel}-proxy-country_#{country}"
  end

  def fetch(url, country \\ "US") do
    opts = [
      proxy: {@gateway, @port},
      proxy_auth: {proxy_username(country), System.fetch_env!("SCRAPELESS_PROXY_PASS")},
      recv_timeout: 30_000
    ]

    headers = [{"User-Agent", "Mozilla/5.0 (compatible; ElixirScraper/1.0)"}]

    case HTTPoison.get(url, headers, opts) do
      {:ok, %HTTPoison.Response{status_code: 200, body: body}} -> {:ok, body}
      {:ok, %HTTPoison.Response{status_code: code}} -> {:error, {:http, code}}
      {:error, reason} -> {:error, reason}
    end
  end
end

The parsing half is identical — HTTPoison returns a body string, and Floki does the rest. Choose Req for new code (it ships JSON decoding, redirects, and connection pooling out of the box) and HTTPoison when you are extending a project already built on it.

Advanced 2: a Crawly spider with pagination and proxy egress

For anything beyond a handful of URLs, Crawly replaces the hand-rolled loop. A spider declares its start URLs and a parse_item/1 callback; Crawly schedules requests across workers, follows the new requests the callback returns (that is how pagination works), and pushes parsed items down a pipeline.

Wire the proxy through the RequestOptions middleware. It passes its keyword list straight to the underlying HTTPoison fetcher, so the same :proxy and :proxy_auth options from the HTTPoison variant apply to every request the spider makes:

elixir Copy

# config/config.exs
import Config

config :crawly,
  closespider_itemcount: 200,
  concurrent_requests_per_domain: 3,
  middlewares: [
    Crawly.Middlewares.DomainFilter,
    Crawly.Middlewares.UniqueRequest,
    {Crawly.Middlewares.UserAgent,
     user_agents: ["Mozilla/5.0 (compatible; ElixirScraper/1.0)"]},
    {Crawly.Middlewares.RequestOptions,
     [
       proxy: {System.get_env("SCRAPELESS_PROXY_GATEWAY", "gw-us.scrapeless.io"), 8789},
       proxy_auth:
         {"#{System.fetch_env!("SCRAPELESS_CHANNEL_ID")}-proxy-country_US",
          System.fetch_env!("SCRAPELESS_PROXY_PASS")},
       recv_timeout: 30_000
     ]}
  ],
  pipelines: [
    Crawly.Pipelines.Validate,
    {Crawly.Pipelines.DuplicatesFilter, item_id: :title},
    Crawly.Pipelines.JSONEncoder,
    {Crawly.Pipelines.WriteToFile, extension: "jl", folder: "./output"}
  ]

The spider itself implements three callbacks. parse_item/1 does two jobs at once: it extracts the items on the current page, and it builds follow-up requests for the next page — that second list is what drives pagination:

elixir Copy

defmodule BooksSpider do
  use Crawly.Spider

  @impl Crawly.Spider
  def base_url, do: "https://books.toscrape.com/"

  @impl Crawly.Spider
  def init, do: [start_urls: ["https://books.toscrape.com/"]]

  @impl Crawly.Spider
  def parse_item(response) do
    {:ok, document} = Floki.parse_document(response.body)

    # Extract one item per product card on this page.
    items =
      document
      |> Floki.find("article.product_pod")
      |> Enum.map(fn card ->
        %{
          title: card |> Floki.find("h3 a") |> Floki.attribute("title") |> List.first(),
          price: card |> Floki.find("p.price_color") |> Floki.text()
        }
      end)

    # Build the next-page request: this list is how Crawly paginates.
    next_requests =
      document
      |> Floki.find("li.next a")
      |> Floki.attribute("href")
      |> Enum.map(fn href ->
        href
        |> Crawly.Utils.build_absolute_url(response.request_url)
        |> Crawly.Utils.request_from_url()
      end)

    %Crawly.ParsedItem{items: items, requests: next_requests}
  end
end

Run it from iex -S mix:

elixir Copy

Crawly.Engine.start_spider(BooksSpider)

Crawly walks every page by following the li.next a link the callback returns, writes each validated, deduplicated item to ./output/BooksSpider.jl, and stops at closespider_itemcount. Every request egresses through the Scrapeless residential proxy because the RequestOptions middleware injected the :proxy and :proxy_auth options into the request before the fetcher ran.

Get your API key on the free plan: app.scrapeless.com

Avoiding blocks: residential egress, geo pinning, and backpressure

A scraper gets blocked for predictable reasons, and most of them are addressable from the configuration you already have:

Datacenter IP reputation. A server's IP range is the first signal a bot manager checks. Routing through Scrapeless residential proxies makes the request look like an ordinary home connection, which is the single largest lever against IP-reputation blocks.
Egress geography. Pages gate content by region — pricing, stock, consent walls. Pin the country with the country_<CC> username segment so the result matches the locale you intend to read.
Concurrency that looks like an attack. Cap parallelism at ≤3 requests per host. With Task.async_stream, that is max_concurrency: 3; with Crawly, it is concurrent_requests_per_domain: 3. Past that, a tight in-flight pool is indistinguishable from a flood.
A default user-agent. Req and HTTPoison send a library-default UA that is trivial to filter. Set a realistic browser user-agent (as the snippets above do) or rotate a list through Crawly's UserAgent middleware.
Pacing. For non-Crawly loops, space requests with a small Process.sleep/1 between batches rather than firing the whole set at once. Crawly paces requests for you through its scheduler.

None of this rescues a page whose content arrives by JavaScript after the first paint — that is the next section.

JS-heavy and anti-bot targets: route through the Scrapeless Scraping Browser

Req, HTTPoison, and Crawly all return whatever bytes the origin sends. For a React, Vue, or Next.js app, those bytes are an empty shell plus a script tag — the content paints client-side, and Floki parses an empty tree. A server-side HTTP client cannot run that JavaScript; a cloud browser can.

There are two ways to reach the Scrapeless Scraping Browser from Elixir, and they map to the two halves of a real workload.

(a) HTTP requests through Scrapeless residential proxies — the rendered majority

Most pages on most sites ship server-rendered HTML. For those, the residential-proxy tier above is the whole answer: the Req and HTTPoison clients already egress through a residential IP, which clears IP-reputation gates and geo-restrictions without any browser at all. Keep this tier on the majority of pages that return content directly — it is the cheapest path, and Elixir's concurrency makes it fast.

(b) Calling out to the cloud browser — the client-side-rendered minority

Elixir does not drive the Chrome DevTools Protocol as cleanly as Node or Python, so for the JavaScript-rendered minority the idiomatic move is to keep Elixir as the orchestrator and call out to the Scrapeless Scraping Browser through a small rendering helper. Elixir spawns the helper as an external process with System.cmd/3, the helper connects to the cloud browser's documented WebSocket endpoint, runs the page, and prints the rendered HTML back to Elixir — which parses it with Floki exactly as before.

The cloud-browser endpoint is a single WebSocket URL with your API key and session parameters as query string values. A minimal Python renderer (saved as render.py) connects to it with Playwright:

python Copy

# render.py — invoked by Elixir via System.cmd/3 for the JS-rendered minority.
import os
import sys
from urllib.parse import urlencode
from playwright.sync_api import sync_playwright

def scraping_browser_url(proxy_country="US", session_ttl=240):
    params = urlencode({
        "token": os.environ["SCRAPELESS_API_KEY"],
        "sessionTTL": session_ttl,
        "proxyCountry": proxy_country,
    })
    return f"wss://browser.scrapeless.com/api/v2/browser?{params}"

def render(url, country="US"):
    with sync_playwright() as p:
        browser = p.chromium.connect_over_cdp(scraping_browser_url(country))
        context = browser.contexts[0] if browser.contexts else browser.new_context()
        page = context.pages[0] if context.pages else context.new_page()
        # Warm the homepage first, then navigate to the target page.
        page.goto("https://quotes.toscrape.com/", wait_until="load")
        page.goto(url, wait_until="networkidle")
        html = page.content()
        browser.close()
        return html

if __name__ == "__main__":
    sys.stdout.write(render(sys.argv[1], sys.argv[2] if len(sys.argv) > 2 else "US"))

The rendering runs cloud-side in Scrapeless's anti-detection browser; the local Playwright install is only the protocol client. Warming the homepage before the target page seeds cookies and navigation state, which produces a cleaner render on pages that gate first-time visitors.

From Elixir, the call-out and parse stay in one function. System.cmd/3 blocks the calling process until the helper returns — fine inside a Task, since the BEAM keeps every other process running:

elixir Copy

defmodule ElixirScraper.CloudBrowser do
  @doc """
  Render a JS-heavy `url` through the Scrapeless Scraping Browser and
  return the post-paint HTML for Floki to parse.
  """
  def render(url, country \\ "US") do
    case System.cmd("python", ["render.py", url, country], stderr_to_stdout: true) do
      {html, 0} -> {:ok, html}
      {output, code} -> {:error, {:render_failed, code, output}}
    end
  end

  def quotes(url) do
    with {:ok, html} <- render(url),
         {:ok, document} <- Floki.parse_document(html) do
      texts = document |> Floki.find("span.text") |> Floki.text()
      authors = document |> Floki.find("small.author") |> Floki.text()
      {:ok, %{quotes: texts, authors: authors}}
    end
  end
end

The before/after is the whole point. A plain Req.get on https://quotes.toscrape.com/js/ returns 0 quote elements, because HTTP cannot execute the page's JavaScript. The same URL through ElixirScraper.CloudBrowser.render/2 returns the fully rendered DOM with all 10 quotes, because the cloud browser ran the JavaScript first. That is platform behavior, not a tuning trick.

For a tiered pipeline, fetch with Req first, count the elements you expect, and escalate only the pages that come back empty to CloudBrowser.render/2. Elixir's Task.async_stream runs the HTTP tier wide and the browser tier narrow, since cloud-browser sessions are scarcer than HTTP requests — keep the browser tier at max_concurrency: 3.

Troubleshooting

Symptom	Likely cause	Fix
Floki returns `[]` for a selector that exists in the browser	Page renders content client-side; HTTP returned an app shell	Escalate the URL to `CloudBrowser.render/2`; parse the rendered HTML
Proxy CONNECT rejected / 407 from the gateway	Missing or wrong Basic-auth credentials	Confirm `proxy_headers` (Req) or `:proxy_auth` (HTTPoison) carries the channel username and password
`Floki.parse_document` returns `{:error, ...}`	Body is not HTML (JSON API, redirect page, or empty)	Check `resp.status`; for JSON endpoints decode the body instead of parsing it as HTML
Same content regardless of `country_<CC>`	The page does not vary by region, or the country segment was not updated	Verify the username segment changed; some pages are not geo-gated at all
Access-denied or challenge interstitial instead of content	Datacenter egress or first-visit gate	Route through residential egress and warm the site's homepage in the same session before the target page
Crawly stops after one page	`parse_item/1` returned no follow-up requests	Confirm the next-page selector matches and `Crawly.Utils.request_from_url/1` wraps each absolute URL
`System.cmd` errors with `:enoent`	The `python` executable is not on PATH	Use the full interpreter path, or invoke through a shell that resolves it

A note on selector drift: when a target site reshuffles its markup, your Floki.find/2 calls silently return empty lists rather than raising. Re-check and tighten selectors against the new DOM whenever a previously working scraper starts returning blanks — treat an empty list as a signal to inspect, not a normal result.

Conclusion: scale your Elixir scraping pipeline

The Elixir pattern reduces to four moves. Fetch with Req or HTTPoison (or let Crawly schedule the fetches) through Scrapeless residential proxies; parse with Floki's CSS selectors; paginate by returning follow-up requests from parse_item/1; and escalate the JavaScript-rendered minority to the Scrapeless Scraping Browser, reached from Elixir as an external rendering call-out rather than asking the BEAM to speak CDP directly.

From here the same shape composes into larger systems. For the residential-proxy layer in depth, see What Is an SSL Proxy?. Before you ship: pin country_<CC> for geo-bound pages, hold concurrency at ≤3 per host, set a realistic user-agent, treat absent selectors as nullable, and keep the HTTP tier wide and the cloud-browser tier narrow.

Ready to Build Your AI-Powered Data Pipeline?

Join our community to claim a free plan and connect with developers building Elixir scraping pipelines: Discord · Telegram.

Sign up at app.scrapeless.com for free Scraping Browser runtime and adapt the patterns above to the pages and regions your pipeline needs. Full reference at docs.scrapeless.com.

FAQ

Q: Is web scraping with Elixir legal?

The language is irrelevant to legality. Scraping publicly available data is generally permissible in many jurisdictions, but the law is not uniform: review each site's Terms of Service, avoid collecting personal or copyrighted data you have no right to, and remember rules vary by jurisdiction. When in doubt, get legal advice for your specific use case. Scrapeless accesses publicly available data only.

Q: Do I need a proxy for Elixir scraping?

For anything at scale, yes. A server's datacenter IP is one of the first things a bot manager flags, and residential egress sharply cuts those blocks. A proxy is also required whenever a page gates content by region. Scrapeless supplies residential proxies in 195+ countries — set the country_<CC> username segment and route Req, HTTPoison, or Crawly through the gateway, so you do not have to source and rotate IPs yourself.

Q: Req or HTTPoison — which should I use?

For new code, Req: it ships JSON decoding, redirect following, and connection pooling through Finch, with less boilerplate. Choose HTTPoison when you are extending a project already built on it, or when you want hackney's :proxy / :proxy_auth tuple options directly. Both parse identically with Floki, so the choice is about the client ergonomics, not the scraping.

Q: When do I need the Scrapeless Scraping Browser instead of plain HTTP?

When the HTML your client returns is a JavaScript app shell with no content. The tell: a selector you can see in a real browser returns an empty list from Floki. That page renders client-side, so a server-side HTTP client never sees the data. Route those URLs through the cloud browser, which runs the JavaScript before you parse — and keep plain HTTP on the pages that already return content.

Q: Why call out to a rendering helper instead of driving the browser from Elixir directly?

Elixir does not have a first-class Chrome DevTools Protocol driver the way Node and Python do, so the cleanest pattern keeps Elixir as the orchestrator and delegates rendering to a small external helper invoked with System.cmd/3. Elixir still owns the crawl loop, the concurrency caps, the pagination, and the Floki parsing; only the JavaScript execution moves to the cloud browser. This keeps the Elixir code idiomatic and the integration to one external call.

Q: How many concurrent requests should I run?

Hold at ≤3 per host. With Task.async_stream, set max_concurrency: 3; with Crawly, set concurrent_requests_per_domain: 3. Public catalogues tolerate a little more, anti-bot-protected origins want less, but 3 is a safe default that keeps the in-flight pool from looking like an attack. Keep the cloud-browser tier even narrower, since rendered sessions are scarcer than HTTP requests.

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Why Your Elixir Scraper Gets Blocked, And How Residential Proxies + Cloud Browser Fix It

Key Takeaways:

Introduction: Why Elixir, and where the friction starts

What You Can Build

Why pair Elixir with Scrapeless

Prerequisites

Install: set up the mix project and dependencies

Configure: store your Scrapeless credentials

Basic: fetch with Req through a residential proxy, parse with Floki

Advanced 1: the HTTPoison variant

Avoiding blocks: residential egress, geo pinning, and backpressure

JS-heavy and anti-bot targets: route through the Scrapeless Scraping Browser

(a) HTTP requests through Scrapeless residential proxies — the rendered majority

(b) Calling out to the cloud browser — the client-side-rendered minority

Troubleshooting

Conclusion: scale your Elixir scraping pipeline

Ready to Build Your AI-Powered Data Pipeline?

FAQ

Q: Is web scraping with Elixir legal?

Q: Do I need a proxy for Elixir scraping?

Q: Req or HTTPoison — which should I use?

Q: When do I need the Scrapeless Scraping Browser instead of plain HTTP?

Q: Why call out to a rendering helper instead of driving the browser from Elixir directly?

Q: How many concurrent requests should I run?

Most Popular Articles

Scrapeless and Nstbrowser Jointly Establish “Browser Labs”: Launching Strategic Partnership and Comprehensive Cloud Browser Upgrade Plan

How to Enhance Crawl4AI with Scrapeless Cloud Browser

Scrapeless MCP Server Is Officially Live! Build Your Ultimate AI-Web Connector