Scripting API to the rescue
We contemplated writing a solution that would save us time by automatically discovering the winning combination. At the beginning of each day, it would cycle through all the different combinations of headers until it found that special one. The spiders would then be set to use that combination. This wouldn’t really solve the fundamental problem of our crawlers being identified, but it would help the development team with the gruntwork.
We discovered that using browser scripting to replicate a user session was the trigger that tipped off the anti-bot system. We needed the browser to access the website but needed to find another way to execute user actions so they’d be undetected.
Zyte API’s Scripting API has a function called Page.evaluate(). It allows us to execute JavaScript (JS) code within a page context. So instead of using the browser scripting to click different elements in the user session, we used JS to program the actions and called Page.evaluate() to execute that code. Once we had access to the site, all the requests came from the same browser and IP, making it hard to be detected, and the JS performed the user actions faster.
Our Scripting API strategy eliminated the excessive amount of time used to setup and discover the winning combination of headers. JS was the perfect programming language to perform a request and process a response from the site as if you were using a browser. Our team’s development velocity (and sanity levels) returned to normal because we weren’t spending hours battling website bans. Now the client has consistent access to the target domain.
Interested in trying out Zyte API’s powerful ban handling features like the Scripting API? Sign up for your free trial and see how we handle your toughest websites.