Your VPS is ready, but now you need to work through the same sequence you have run a dozen times before: apt update, apt install python3-pip, pip install scrapy, playwright install chromium, the Chromium dependency list that never installs cleanly on the first try, Redis, possibly Postgres, whatever else this particular project needs.
Multi-agent orchestration is having its moment. The diagrams are everywhere now. Boxes for planners, boxes for hands, boxes for daemons, arrows to a shared brain, a human floating at the top. They keep getting prettier. The part where the web pushes back is still the part nobody draws.
The problem was a project with 12,000 websites to crawl, and there’s no world where you write custom spiders for 12,000 websites, not with a human team and certainly not sustainably. So Javier built a workflow: a set of AI prompts that could analyze a website, figure out its structure, and generate a crawl configuration that a generic spider could then use.
If you want to understand exactly how a browser scraping service works at the infrastructure level, or you have a steady workload that you want running on hardware you already own, building one yourself teaches you things that matter. Here's how I did it
Data-gathering doesn’t have to be memory-intensive. You can fit the world’s weather on a 9cm-square board, when you move the work to a web scraping API.
For the last 30 days, I did one thing almost exclusively: I built scraping systems with AI agents, from the ground up, across real targets, with real deadlines. Not prototypes designed to impress in a demo, not isolated experiments running against a toy website, but production-grade pipelines that needed to ship and keep running.
I've been running a series of conversations with developers at Zyte to understand what's actually changed in the way they work since LLMs showed up. Not the headlines. The day-to-day. What they delegate, what they don't, what they notice, what surprises them. This one was different on two counts.
the next time you spin up a VPS to give it a persistent home, you spend the better part of an afternoon rebuilding from memory: installing Scrapy, wiring up Redis, configuring the systemd units, getting Playwright's Chromium dependencies in the right state. Here's a tool to help
Ayan's 4 agent team, using Claude's /goal, and the models and coding agents he uses to code effectively.
Many data teams still think running a proxy-based scraping stack is most cost-effective. Industry pressures and our research disprove that idea.
New legal and regulatory compulsions for web data have significant business consequences. So, how can technologists engineer their company’s risk profile lower?
In our interview, a QA expert warns - before you delegate web scraping quality assurance to AI, make sure you can describe what ‘good’ looks like for yourself.