PINGDOM_CHECK

#ExtractSummit2026 The world's largest web scraping conference returns. Austin Oct 7–8 · Dublin Nov 10–11.

Register now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    Coding Agent Add-Ons

    Agentic Web Data

    Plugins that give coding agents the context to build production Scrapy projects. Starts with Claude Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Join our Community
    Web scraping APIs vs proxies: A head-to-head comparison
    Blog Post
    The seven habits of highly effective data teams
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Home
Blog
Five key takeaways from Extract Summit 2025
Light
Dark

Five key takeaways from Extract Summit 2025

Read Time
5 mins
Posted on
November 10, 2025
From AI-accelerated scraping to “dead internet” risks and rising access wars, these five takeaways from Extract Summit 2025 show where web data is heading next.
By
Robert Andrews
IntroductionAI is accelerating the hard work of scrapingModels may choke on a dead internetThe access wars are heating upThe devil is in the detailAn ID card for your agent?Summary
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more
Subscribe to our Blog
Table of Contents

What is the state of web data right now? The sixth annual Web Data Extract Summit brought together a couple of hundred of the web scraping space’s finest to answer that question.


Held on November 5 and 6, 2025, at the Gibson Hotel in Dublin, Ireland, the summit featured one day of debate and one full day of hands-on workshops.


The summit concluded with a sense that web data gathering is becoming simultaneously easier and more challenging.

1. AI is accelerating the hard work of scraping

After a year of hype over no-code AI data extraction tools, AI is settling into the scraping workflow as professional developers’ new best friend.


Scrapoxy creator Fabien Vauchelles demonstrated how he uses an LLM inside his code editor to reverse-engineer the inner workings of complex, obfuscated anti-bot mechanisms, saying: “This kind of work would have taken me three months in the past. I can do that in 15 minutes now.”

Zyte used Extract Summit Dublin to launch the beta of its Web Scraping Copilot, a Visual Studio Code extension with automated parsing code generation. Chief product officer Iain Lennon told data developers he aims to keep them in control of their pipeline with “partial autonomy” and “a sliding scale of choice”:


"We're asking ourselves, ‘How can we accelerate code for web scraping with AI?’ Code is still the best solution.”

– Iain Lennon, chief product officer, Zyte


Developers got hands-on with the tool during Extract Labs workshop sessions.


Such developments are rapidly accelerating the “time to data” for web scraping engineers. Zyte chief operating officer Suzanne Hassett told attendees:


"I have responsibility for over 100 developers - this is doubling our output. This is huge for us.”

– Suzanne Hassett, chief operating officer, Zyte

2. Models may choke on a dead internet

AI is not all good news. “Dead internet theory” - the idea that the internet is becoming a hollowed-out shell, populated more by bots than by humans - is now a statistical reality, said Domagoj Marić, AI customer delivery manager at Pontis Technology, in a talk that shocked listeners with a grim picture of a "synthetic web" drowning in AI-generated content.

Attendees were struck by Marić’s depiction of a web on which bots post text comments, 50% of traffic is now from non-human sources and generative AI is used to create photos and videos that are as irresistible as they are inauthentic.


There may be a looming irony: next-generation AI models whose forebears were used to create fake content are now feeding on their own derivative output as training input - a cannibalistic loop that could lead to “model collapse”.


Want to keep the internet human and your data grounded in reality? Marić urged attendees to “support authentic content”, “get off Facebook” and “not just be lurkers”.

3. The access wars are heating up

The cat-and-mouse game between web scrapers and the anti-bot systems aiming to protect sites is escalating into a high-tech, high-stakes arms race, with speakers at the Extract Summit 2025 declaring that the old rules of engagement are dead.

According to speakers’ talks (and the chat over coffee and Guinness), the battle has moved beyond simple IP-based blocking, with Antoine Vastel, head of research at anti-fraud platform provider Castle, saying an IP address  is now considered a "weak signal”.


In its place? Anti-bot systems are looking to identify a scraper’s entire persona, attendees heard. This includes the network fingerprint (TLS/JA4), the browser fingerprint (Canvas, WebGL, audio context), and user behavior.


This technical escalation is driving up the cost of entry for data-gatherers. Scraping expert Fabien Vauchelles explained:


"The goal of the anti-bot (systems) is to raise the bar every time," 

– "Fabien Vauchelles, creator, Scrapoxy


He framed the conflict as an economic tit-for-tat that is designed to make data access prohibitively expensive.

That skirmish also has seasons. Speaking on a panel, Kenny Aires of Zyte’s done-for-you data delivery team said that, two weeks before major shopping events like Black Friday, "we see the anti-bots upgrading... it's very challenging," creating a frantic scramble for scraping teams. 


“We see even small websites (begin to) use protection,” Aires said.

4. The devil is in the detail

For anyone gathering web data these days, attention to detail is emerging as a key skillset.


Kieron Spearing of Centric Software championed an "investigative mindset," urging developers to "take their time" to forensically deconstruct every request. Spearing argued that a single technical flaw can produce a "cascading failure" that can derail an entire large-scale operation. Meticulousness, he insisted, is the only path to building stable, scalable scrapers.

The legal status of scraped web data, too, hinges on many fine nuances.


For example, web data gatherers already know that website owners can declare their access preferences to crawlers in a robots.txt file.


But, speaking on a panel, Dr. Bernd Justin Jütte, associate professor in intellectual property law at University College Dublin, said: "A recent ruling in Germany...said the declaration doesn't have to be machine-readable... it can also be written in natural language into...the terms and condition of the website.” Respecting wishes articulated in myriad different formats could prove challenging.

According to Dr. Nikos Minas, global IP counsel, Wesco International:


"Where do you get your data? That should be your primary concern.”

– Dr. Nikos Minas, global IP counsel, Wesco International.

5. An ID card for your agent?

Though web publishers and data gatherers continue to size each other up, it’s no longer a two-sided world, as autonomous web agents also now enter the fray.


They may not be recognisable as data extraction tools, but some large operators are now beginning to block the likes of ChatGPT Agent and Perplexity’s Comet browser.


Scrapoxy’s Vauchelles warned:


“We are moving toward some kind of a closed internet.”

– Fabien Vauchelles, creator, Scrapoxy


“The future is pretty clear,” he said. “Major websites want to make deals and build authentication systems. The website will say ‘Okay I’ll let you pass’ - but, perhaps for other users, you won't have the same access.”

Castle’s Antoine Vastel, channeling defensive website owners’ perspective, sees promise in Web Bot Auth, a brand-new potential protocol for managing bot access.


“What I like with this standard is that it's crypto, so it's secure,” he told a panel. "Big platforms are asking questions. They don't really know what to do with AI agents. They first want to get visibility - you can't have a strategy if you don't know.”

Summary

Web data scraping is simultaneously becoming easier and more difficult.


Tools like web scraping APIs and AI add-ons are emerging to eliminate the busywork that goes into data gathering.


But Extract Summit 2025 heard accessing sites on the open web is now becoming more challenging and costly than ever for those who lack economies of scale and skill.


Want to go deeper? Watch all summit talks and panel discussions on the Extract Summit YouTube channel.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026