The problem was a legacy project with 12,000 websites to crawl, and there’s no world where you write custom spiders for 12,000 websites, not with a human team and certainly not sustainably. So Javier built a workflow: a set of AI prompts that could analyze a website, figure out its structure, and generate a crawl configuration that a generic spider could then use.
If you want to understand exactly how a browser scraping service works at the infrastructure level, or you have a steady workload that you want running on hardware you already own, building one yourself teaches you things that matter. Here's how I did it
Data-gathering doesn’t have to be memory-intensive. You can fit the world’s weather on a 9cm-square board, when you move the work to a web scraping API.
For the last 30 days, I did one thing almost exclusively: I built scraping systems with AI agents, from the ground up, across real targets, with real deadlines. Not prototypes designed to impress in a demo, not isolated experiments running against a toy website, but production-grade pipelines that needed to ship and keep running.
I've been running a series of conversations with developers at Zyte to understand what's actually changed in the way they work since LLMs showed up. Not the headlines. The day-to-day. What they delegate, what they don't, what they notice, what surprises them. This one was different on two counts.
the next time you spin up a VPS to give it a persistent home, you spend the better part of an afternoon rebuilding from memory: installing Scrapy, wiring up Redis, configuring the systemd units, getting Playwright's Chromium dependencies in the right state. Here's a tool to help
Ayan's 4 agent team, using Claude's /goal, and the models and coding agents he uses to code effectively.
New legal and regulatory compulsions for web data have significant business consequences. So, how can technologists engineer their company’s risk profile lower?
In our interview, a QA expert warns - before you delegate web scraping quality assurance to AI, make sure you can describe what ‘good’ looks like for yourself.
New models can process larger inputs, and confuse themselves in the process. Context management techniques can solve the problem.
Proxies are essential to scraping at scale. So, how do full-stack web scraping APIs compare?
Quickly compare e-commerce products across any site with an agent, a skill and an AI-powered web scraping API.