PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    SERP

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Join our Community

    Featured Posts

    Building superior AI models with quality web data
    Blog Post
    Powerful new spending controls and usage insights for Zyte API
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
All articles
Discord Community

Inside the anti-block engine: How we built a 320,000-strategy web access hero

Read Time 10 minPosted on June 19, 2026
Use case
For data gatherers, negotiating anti-bot tech is not for the faint-hearted. That’s why we built a multi-layered system that treads lightly but brings a host of tactics to the table.
By
Iain Lennon

A few years ago, we noticed a fundamental shift in how the web defends itself.

For a long time, web scraping followed a predictable cat-and-mouse dynamic. You built a spider, pointed it at a target, and wrote a parser. If you got blocked, you rotated your IP addresses, tweaked your user-agent headers, and went back to work. It was a manual, iterative game, but it was highly manageable.

Then came what I call the “Great Hardening”.

Target sites stopped relying on simple IP blacklists. Instead, they began deploying sophisticated, commercial anti-bot platforms. These platforms don’t just look at where a request is coming from; they analyze TCP/IP stack fingerprints, TLS handshakes, browser engine behaviors, and real-time user interaction patterns.

Suddenly, the support tickets we received weren't about simple mark-up changes. They were about silent, complex failures: sessions that mysteriously expired after three requests, browser fingerprints that burned out instantly, CAPTCHAs appearing mid-flow, and "soft-blocking" where a site returns a 200 OK status but delivers an empty page or a challenge screen.

And the rate of change accelerated. Where sites once made a periodic change every few months, AI-powered antibot platforms started pushing alterations faster than humans could stay ahead.

That was the moment we had to accept a brutal engineering reality: at scale, ban handling is no longer a configuration problem; it is an AI engineering problem.

To solve it, we had to stop building better bypass tools and start building an automated decision engine. Today, that engine orchestrates over 320,000 strategy permutations per request. This is the story of why we built it, how it works under the hood, and what we learned along the way.

The ‘best config’ fallacy

When engineering teams try to build an anti-ban system in-house, they almost always start by assembling a toolkit. They buy a few proxy pools, write some retry logic, integrate third-party CAPTCHA management, and write custom browser automation scripts for their toughest targets.

But the problem isn't the tools themselves; it's the orchestration. On every single request, someone - or something - has to make a series of high-stakes decisions:

  • Can we get away with a lightweight HTTP request, or do we need to pay the massive latency and CPU tax of spinning up a headless browser?
  • Which proxy network and geographic location will trigger the least friction for this specific target right now?
  • How do we manage sessions and cookies so we look like a returning user without creating an obvious, machine-like footprint?
  • If a request fails, what does the failure actually mean? Is it a transient network hiccup, an aggressive rate limit, or a hard IP ban?

If your team is making these decisions manually by writing custom configuration files for every target domain, you are playing a permanent, exhausting game of whack-a-mole. The moment an anti-bot vendor updates its detection algorithms, your static "best config" instantly decays.

We realized that the only way to win was to move the burden of adaptation from human engineers to an intelligent control layer. That engine's job is to choose the lightest possible strategy that works, and to constantly revise that choice based on real-time feedback.

The architecture: System 1 and System 2

To handle this at a scale of billions of requests, we designed a two-tiered architecture inspired by how the human brain processes decisions: a fast, instinctive layer and a slower, analytical layer. Readers of Daniel Kahneman will appreciate the analogy.

1                  [ Incoming Request ]
2                           │
3                           ▼
4┌──────────────────────────────────────────────────────┐
5│                      SYSTEM 1                        │
6│             (Real-Time / Critical Path)              │
7│                                                      │
8│  • Reads domain & historical success signals         │
9│  • Selects lightest known working strategy           │
10│  • Executes request (Latency Budget: Milliseconds)   │
11└──────────────────────────┬───────────────────────────┘
12                           │
13                 If Success│ If Failure / Decay
14                           │ (Asynchronous Trigger)
15                           ▼
16┌──────────────────────────┴───────────────────────────┐
17│                      SYSTEM 2                        │
18│            (Autoconfigurator v2 / Offline)           │
19│                                                      │
20│  • Isolates failed domain & forms hypotheses         │
21│  • Generates candidate configurations                │
22│  • Runs controlled tests & scores results            │
23│  • Updates System 1's playbook for next time         │
24└──────────────────────────────────────────────────────┘
Copy

System 1: Instinct on the critical path

System 1 sits directly in the request-response path. When a customer sends a request to Zyte API, System 1 has only milliseconds to decide how to route it. It cannot afford to run experiments or test hypotheses.

Instead, it relies on instant pattern recognition. It:

  • Looks at the target domain.
  • Identifies the active anti-bot vendor (e.g., detecting an anti-bot challenge flow).
  • Checks historical success rates.
  • Immediately selects a playbook from our library of known strategies.
  • Manages the TLS fingerprint.
  • Coordinates header coherence.
  • Routes the request through the optimal proxy pool.

Its goal is speed and efficiency - getting the data back to the user within their latency budget, using the cheapest possible infrastructure.

System 2 (Autoconfigurator v2): The analytical scientist

But what happens when a target site innovates its security rules, causing System 1’s success rate to stall?

In a traditional setup, this is where an engineer gets paged, spends three hours debugging raw network traffic, writes a custom patch, and deploys it.

In our architecture, System 1 simply flags the domain as "degraded" and passes it to System 2, our Autoconfigurator v2.

Autoconfigurator is our automated config “researcher”: it takes a domain that isn’t reliably accessible, generates candidate domain-level configurations, tests them, and then persists the best one so future requests don’t start from scratch.

Autoconfigurator runs entirely outside the critical latency path. It is an asynchronous, automated R&D system. When it takes over a degraded domain, it behaves like a laboratory scientist:

  1. Forming hypotheses: It analyzes the failure signals. Is the site blocking us because of our IP reputation? Is it detecting our TLS stack? Or does it require full JavaScript execution to establish a session?
  2. Generating candidates: It creates dozens of distinct configuration variants, mixing different proxy types, geographic locations, browser profiles, and session policies.
  3. Controlled testing: It runs isolated, low-volume test requests against the target domain using these candidate configurations, carefully scoring the results based on success rate, response latency, and infrastructure cost.
  4. Updating the playbook: Once it identifies the optimal configuration - for example, discovering that a site now requires a real browser context to set a session cookie, but can then be scraped using cheaper HTTP requests - it writes a new domain-specific playbook and pushes it back to System 1.

By splitting the system this way, we get the best of both worlds: blazing-fast, cost-effective routing for the 95% of requests that behave predictably, and a self-healing, automated optimization loop for the difficult 5% long tail.

The power of a strategy permutation

When we say our engine navigates 320,000+ strategy permutations, we aren't throwing around a vanity metric. That number represents the actual mathematical complexity of modern web access.

A "strategy" is not a single magic bypass trick. It is a highly coordinated bundle of choices across multiple technical dimensions, as shown in the table below:

Dimension Technical choice Impact on access
Fetch mode Raw HTTP vs. headless browser rendering Determines if we pay the CPU and latency cost of a full browser.
Identity profile User-agent, TLS fingerprint, TCP/IP stack Matches the client's network-level signals to look like a real consumer device.
Geo-coherence IP location matching DNS, CDN, and header locales Prevents blocks triggered by "impossible travel" or regional mismatches.
Session policy Session reuse, state management, cookie isolation Allows us to maintain logged-in or authorized states across requests.
Escalation rules Automated CAPTCHA management, interactive clicking Solves active friction points only when they are presented.
Retry strategy Adaptive backoff, error-specific retry budgets Prevents us from "hammering" a site and burning our own IP pools.

If you multiply these options across dozens of proxy networks, hundreds of geographic regions, and thousands of browser profiles, the search space becomes massive.

The real magic of the engine isn't that we have these 320,000 options; it's that the signal layer knows how to interpret failures so we don't guess blindly.

If a site returns a "soft block" (like a challenge screen wrapped in a 200 OK), the engine doesn't just retry the request with the same settings. It reads the signal, realizes the fingerprint is burned, escalates the request to a browser context to solve the challenge, captures the session cookie, and then drops back to cheap HTTP for the remaining requests.

This is what we mean by "the lightest strategy that works." It keeps success rates high while keeping our customers' API costs and latency as low as possible.

The reality of constant evolution

There is a common myth in the scraping industry that you can write a "perfect" unblocker, package it up, and sell it as a static product.

The reality of production web scraping is closer to Site Reliability Engineering (SRE). The web is a highly dynamic, living organism. Anti-bot vendors are constantly shipping updates, target sites are switching security providers overnight, and machine-learning models are continuously updating their detection heuristics.

This is why a solo developer or a small engineering team trying to build equivalent functionality in-house is fighting a losing battle. It isn't a question of engineering talent; it is a question of feedback loops and R&D bandwidth.

Because Zyte API processes billions of requests every week, our engine receives a massive, continuous stream of telemetry. We see anti-bot updates the moment they are deployed anywhere on the web. Our automated loops (System 2) catch and resolve the vast majority of these changes before customers even notice.

And for the genuinely new defensive techniques - the ones that require creative human engineering - we have a dedicated, full-time team of anti-bot researchers. Their entire job is to study emerging security technologies, design new bypass mechanics, and feed those capabilities into the engine's library.

Building a system like this requires an ongoing, multi-million-dollar commitment to research and development. It is a continuous, self-healing loop of signal, decision, configuration, and human R&D.

The web isn't going to become easier to scrape. But, by moving the burden of adaptation from human engineers to an automated decision engine, we can at least make the fight sustainable.

We didn't build an anti-ban engine because it was an elegant engineering exercise; we built it because, at our scale, it was the only sane way to survive.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more

Table of contents

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026