Beyond the block: The front line of data access

Web data scraping has long been a cat-and-mouse game between data extractors and the websites they target.

But, increasingly, this relationship is no longer a simple contest of blocking and unblocking. It is evolving into a technological and economic arms race, where every request is a potential piece of intelligence in a complex access assessment.

In a panel discussion at Extract Summit 2025 in Dublin, Zyte CEO Shane Evans asked experts from both sides of the industry to check the pulse of this evolving space:

Antoine Vastel, head of research at anti-bot firm Castle.
Fabien Vauchelles, a web scraping expert and creator of ScrapeOps.
Kenny Aires, a team lead at Zyte.

Redefining ‘Bad Bot’ through value, not morality

The old, simple classification of "good bots" (like search engines) and "evil bots" (scrapers) has been thrown out the window. Castle’s Antoine Vastel made it clear that the modern approach is now purely pragmatic.

"It's not even about ethics," Vastel explained. The decision to block or allow a bot now comes down to its value to the website owner. "If you're (a search engine), you're a ‘good’ bot because the customers benefit from you. If you have a contractual relationship with the website, they will consider you a good bot. All the bots that are not good bots are more or less handled as ‘bad’."

Vastel’s take suggests that even well-intentioned scrapers can be considered hostile traffic if they do not bring value to the party. This business-first logic frames the entire relationship, establishing that any unapproved automation is a target for blocking.

The economics of evasion: Raising the cost to play

Fabien Vauchelles, whose work focuses on the infrastructure that powers scraping, offered the counter-perspective. He argued that the primary goal of anti-bot systems isn't just to block, but to make scraping so expensive that it becomes unprofitable.

The real cost, he explained, comes from the need to build and maintain an entire technical stack for accessing websites. This involves managing not just IP addresses but also browser fingerprints, TLS signals, and presentation patterns. Any inconsistency across these layers can trigger a block.

"The goal of the anti-bot is to raise the bar every time... to make you access the data in a more expensive way."
– Fabien Vauchelles, creator, Scrapoxy

This forces scrapers into a difficult choice: invest heavily in the internal skills and infrastructure to keep up, or purchase managed services that handle the complexity for them.

From IP blocks to subtle signals

The days of simple IP-based blocking are numbered. Castle’s Vastel emphasized that this method is becoming "more and more risky" due to the prevalence of shared residential and mobile IPs. A single block could inadvertently cut off thousands of legitimate users - a disaster for any e-commerce site paying for traffic.

Instead, the industry has moved toward a more nuanced, score-based approach. "It's more like a gradient," Vastel said. Instead of a binary "bot/not bot" decision, systems now build a profile over time, analyzing a user's entire journey.

This reality was echoed by Zyte’s Kenny Aires, who manages data projects on the front lines. He noted that, while services like the Zyte API can handle much of this complexity, his team also develops custom scripts to handle extreme "edge cases".

The escalation is particularly intense during peak retail seasons. "Two weeks before [Black Friday], we see the anti-bots upgrading," Aires shared, highlighting the constant pressure on his team to adapt.

AI enters the fray, on both sides

Artificial intelligence is the latest and most disruptive force in this arms race, and both sides are leveraging it.

For scrapers, large language models (LLMs) have become a powerful weapon. "It's really interesting what LLMs can do, especially with multimodal models," said Vauchelles. "You can really use … the models to (handle) CAPTCHAs."

Vastel acknowledged this reality, saying that traditional, image-based CAPTCHAs are now effectively broken. "You will always find an AI model capable of (handling) it," he said.

The biggest unknown, however, is the rise of AI agents - tools like Perplexity and OpenAI's forthcoming search agent that browse the web on a user's behalf.

"Big platforms are asking questions. They don't really know what to do with AI agents."
– Antoine Vastel, head of research, Castle

The first challenge is simply identifying them, but the larger question of whether to treat them as valuable "good bots" or resource-draining "bad bots" remains unanswered.

A glimpse into the future: A ‘closed internet’?

Could this constant escalation may be leading to a fundamental shift in the web's architecture?

Vauchelles offered a downbeat prediction: "I think we will move toward a closed internet." He envisions a future where direct, unauthenticated access to websites is restricted. Instead, most interactions will be mediated through authorized AI agents.

Initiatives like Web Bot Auth, an IETF draft with backing from major tech companies, are already laying the groundwork for such a system. It proposes a cryptographic method for "good bots" to authenticate themselves, proving their legitimacy without revealing user data.

"You will access the website through these agents, and the anti-bot or the website will say, 'Okay, I let you pass’."
– Fabien Vauchelles, creator, Scrapoxy

"Perhaps for other users, you won't have the same access,” Vauchelles added.

This would create a tiered web, where verified agents get preferential treatment and everyone else - including independent scrapers and potentially even regular users - faces a higher wall.

While this could solve some security issues, it also threatens the open, permissionless nature that has defined the web for decades, making the ongoing debate over data access one that will shape the future of the internet itself.