🥳Join the Scrapeless Community and Claim Your Free Trial to Access Our Powerful Web Scraping Toolkit!
Back to Blog

How to Use Cypress to Bypass CAPTCHAs

Sophia Martinez
Sophia Martinez

Specialist in Anti-Bot Strategies

23-Sep-2024

Real users may be distinguished from automated bots using a technique called a CAPTCHA, which stands for "Completely Automated Public Turing tests to tell Computers and Humans Apart." It is a task meant to be easy for people to do but difficult for robots to accomplish. In order to deter bots, CAPTCHAs are typically placed in particular areas of websites.

The most widely used CAPTCHA providers are Google reCAPTCHA, hCaptcha, and BotDetect. These assist with one or more of the subsequent difficulties:

  • Text-based CAPTCHAs: To solve these, users must provide a string of jumbled characters or numbers.
  • Image-based CAPTCHAs: In a grid of photos, users must pinpoint particular things.
  • Audio-based CAPTCHAs: In these cases, the user is prompted to input the words they hear.
  • Puzzle CAPTCHAs: In order to pass, users must click on the correct object to complete a mini-game or provide a simple response to a question.

You may use services that rely on human operators to answer these difficulties in real-time, or you can interface your program with libraries that handle CAPTCHAs. Hard-coded CAPTCHAs are rare, though, as they are inconvenient and worsen the user experience.

More frequently, CAPTCHAs are a component of more sophisticated anti-bot programs like WAFs.

When these solutions believe the user may be a bot, they dynamically show a CAPTCHA. In these situations, you can prevent CAPTCHAs by having your bot act like a person and use a genuine browser. However, this is a never-ending struggle, and you will need to update your automated script frequently to deal with the constantly changing bot detection algorithms.

Using a user-emulation-based, up-to-date application like Scrapeless' CAPTCHA Solver is a more efficient technique to get around CAPTCHAs.

Are you tired with CAPTCHAs and continuous web scraping blocks?

Scrapeless: the best all-in-one online scraping solution available!

Utilize our formidable toolkit to unleash the full potential of your data extraction:

Best CAPTCHA Solver

Automated resolution of complex CAPTCHAs to ensure ongoing and smooth scraping.

Try it for free!

Cypress and CAPTCHAs: An Unhealthy Partnership

A front-end testing tool designed for the current Web is called Cypress. Although web scraping and other generic browser automation activities may be performed with it, end-to-end (E2E) testing is its primary use case. It is therefore primarily intended to interact with websites and web pages that you own or manage.

Issues begin to appear when you use Cypress to target external or third-party websites. The official material makes clear that the best course of action is to minimize interactions with third-party websites. The danger of being identified as a bot and receiving a CAPTCHA is particularly one of the primary justifications mentioned in the documentation.

What makes this an issue? Well, since automated programs are meant to be stopped by CAPTCHAs. As a result, they may interfere with the automation of your Cypress browser. It's also vital to remember that, while difficult, avoiding Cypress's CAPTCHAs is feasible. See the following sections to learn more!

How to Use Cypress to Manage CATPCHAs

As you just discovered, Cypress acknowledges in its documentation that one of its biggest problems is CAPTCHAs. But it's not quite time to throw in the towel just yet. Let's investigate some possible strategies for putting Cypress CAPTCHA circumvention logic into practice!

Method 1: Turn off the CAPTCHAs

Most CAPTCHA providers let users bypass or disable obstacles when they're in a testing environment. Then, if you are in charge of the website where the automation is required, you have to eliminate the CAPTCHA completely or swap it out for a less complicated one.

For testing situations, for instance, you can generate a different key with reCAPTCHA v3. You can use the following test keys for reCAPTCHA v2:

  • Site key: 6LeIxAcTAAAAAJcZVRqyHh71UMIEGNQ_MXjiZKhI
  • Secret key: 6LeIxAcTAAAAAGG-vFI1TnRWxMZNFuojJ4WifJWe

Method 2: Make the CAPTCHA Interaction Automated

Certain CAPTCHAs are as easy as checking a box; one such example is the reCAPTCHA "No CAPTCHA" widget.

These tasks can appear simple at first, but they can be rather complex as they examine your mouse movements to identify whether or not you are human. However, not every CAPTCHA is as difficult. Some are easier to get around and are intended to thwart simple bots. You may attempt automating them in certain situations by utilizing some Cypress logic.

Remember that Cypress is unable to handle cross-domain iframes automatically. To get around the restriction, change the cypress.json file's chromeWebSecurity setting to false:

language Copy
{

"chromeWebSecurity": false

}

Next, you may choose and click the CAPTCHA checkbox element. The automated code for doing that in the event of a reCAPTCHA "No CAPTCHA" widget would be:

language Copy
cy.get('iframe[src*=recaptcha]')

.its('0.contentDocument')

.should(d => d.getElementById('recaptcha-token').click())

Recall that this is only a temporary solution and won't be effective in most circumstances. These days, CAPTCHAs are intelligent enough to discern between a human's click and one from a robot. Ultimately, it is precisely the purpose of a CAPTCHA.

Method 3: Include a Browser Anti-Bot

The two preceding Cypress CAPTCHA bypass methods are too presumptuous to be used to an actual target. Setting up Cypress to manage an anti-detect browser is a better way to go. An anti-detect browser is a customized browser made to stop websites from detecting automated behavior, in case you are not familiar with such technology.

Next, you may provide Cypress the following instructions to start a script in the designated browser:

language Copy
cypress open --browser <path_to_your_browser>

In this case, the absolute path_to_your_browser holding the anti-detect browser binary is represented by .

Similarly, by adding the following code to cypress.config.js, you can set up the Cypress UI to display your anti-detect browser as a selectable option:

Similarly, by adding the following code to cypress.config.js, you can set up the Cypress UI to display your anti-detect browser as a selectable option:

language Copy
import { defineConfig } from 'cypress'

export default defineConfig({

e2e: {

setupNodeEvents(on, config) {

const antidetectBrowser = {

name: '<ANTIDETECT_BROWSER_NAME>',

channel: 'stable',

family: 'chromium',

displayName: '<ANTIDETECT_BROWSER_DISPLAY_NAME>',

version,

path: '<path_to_your_browser>',

majorVersion,

}

return {

browsers: config.browsers.concat(antidetectBrowser),

}

},

},

})

Keep in mind that telling Cypress to execute your automated code in a browser with anti-detect features will only lessen the likelihood that it will be interpreted as a bot. The anti-bot systems could nonetheless impose some CAPTCHAs in order to prevent you from continuing if they recognize that you are using automated code.

Conclusion

You read this article to understand about CAPTCHAs and the reasons that they are a big problem for Cypress. You also looked at three alternative ways to get around them, although there are significant drawbacks to each of these strategies.

Even with well developed Cypress circumvent CAPTCHA logic, your script may still be flagged as automated by powerful bot detection systems. Connecting to your target website using an unlocking API that can provide the HTML of any page without a CAPTCHA is the best course of action.

There is such an API; it's known as Web Unlocker. Through proxy integration, this manages browser fingerprinting, rotates the exit IP automatically with each request, initiates automated retries, and solves CAPTCHAs for you. Anti-bot precautions are now hassle-free!

At Scrapeless, we only access publicly available data while strictly complying with applicable laws, regulations, and website privacy policies. The content in this blog is for demonstration purposes only and does not involve any illegal or infringing activities. We make no guarantees and disclaim all liability for the use of information from this blog or third-party links. Before engaging in any scraping activities, consult your legal advisor and review the target website's terms of service or obtain the necessary permissions.

Most Popular Articles

Catalogue