Partial autonomy, full control: Why we built Web Scraping Copilot

At Zyte, we’re constantly thinking about the future of web data extraction.

That is why we developed our latest tool, Web Scraping Copilot, a Visual Studio Code extension. The core idea is simple but powerful: leveraging AI to support production web data.

"Production web data" is not about one-off scrapes or casual data collection; it is the kind of data gathering that powers the core products and services upon which a business runs. It’s about the specific, mission-critical scenario where data quality, reliability, and integration into existing systems are paramount.

This is the environment where our customers operate, and it comes with a unique and demanding set of requirements.

The demands of production-grade web data

Everyone in our industry is talking about the promise of AI-assisted scraping. But, when we talk to engineering teams who manage large-scale data extraction, they consistently give us feedback on the essential characteristics a solution must have to be viable in their production environments.

Deterministic quality: A production system cannot rely on a "best guess" at data to capture. A solution where an AI at runtime is guessing which HTML element corresponds to the right field is fundamentally problematic for data that powers a live product. Quality needs to be testable, monitorable, and correctable. We need predictable, reliable output, every time.
Fit with existing practices and environments: Professional engineering teams have established workflows, version control systems, CI/CD pipelines, and coding standards. A new tool that forces them to abandon these practices and work in a separate, siloed platform creates friction and overhead. The right solution must fit into their world, not the other way around.
Loosely coupled, including for ban solving: Many businesses in this space run multiple solutions for navigating web blocking, using several proxy vendors or web unblockers. They need the flexibility to choose the right solution for a given site on a given day. An extraction tool that is tightly coupled to a single anti-ban provider is a non-starter. It has to be agnostic.
No additional runtime costs: The cost of running spiders is a major consideration. Adding a new, expensive API call for an LLM into the runtime of every single request can make a project economically unfeasible. The intelligence should be applied during the development phase, not as a recurring tax on execution.
Keep the code: Retention of code means retention of control, prevents lock-in, and allows teams to continue to evolve approach with the emergence of new tools.

When I speak to leaders of enterprise-scale web scraping teams the message they give me is clear - these characteristics are non-negotiable.

The code vs. no-code dilemma

This brings us to a fundamental split in the market: code vs. no-code solutions.

No-code tools are wonderful for certain use cases. I see posts like this on LinkedIn all the time: "As a marketing professional, I wanted to gather a thousand leads for a sales campaign. Now I can do that in two hours, and I don't need an engineer." That’s a fabulous application of the technology, empowering non-developers to achieve their goals with agility and simplicity.

However, these kinds of no-code solutions don't tend to exhibit the key characteristics above. They often lack deterministic quality control, don’t fit with engineering practices, and create a black-box dependency. For production web data, code-based solutions remain the tool of choice precisely because they offer the control, testability, and integration that professional environments demand.

But this doesn’t mean we should ignore the power of AI. It just means we need to apply it in the right way.

Partial autonomy: The sweet spot for AI in engineering

The challenge, and the opportunity, lies in finding the right balance between the raw power of AI and the control required by engineers.

The brilliant Andrej Karpathy, former Director of AI at Tesla, captured this perfectly when he explained how LLMs have encyclopedic recall but jagged intelligence and hallucinations.

We need to design workflows that harness the incredible generative capabilities of LLMs while systematically mitigating their weaknesses..

Karpathy calls the solution "partial autonomy” apps.

These have:

Strong context management: Giving the AI the right information to succeed.
Domain GUIs for human audit: Creating interfaces that allow a human expert to see what the AI is doing and easily verify or correct it.
An optimized generation-verification loop: Making it fast and easy for the user to generate a result, check it, and correct it.
A conceptual "autonomy slider": Ranges from Assist (where the AI acts as a helpful assistant) to Agent (where it acts fully autonomously).

This is the way to find balance: building systems that allow the engineer to choose the level of autonomy they are comfortable with for the task at hand. This is how trust is built. A low-risk prototype might sit further towards the "Agent" end of the slider, while a critical production system will start firmly at the "Assist" end.

In other words, just like at Tesla, autonomy is a journey.

Web Scraping Copilot: Partial autonomy for production web data

This brings me back to why we built the Web Scraping Copilot.

We created components that bring the power of partial autonomy directly into the developer’s existing environment, starting with VS Code. Our Copilot helps engineers generate web scraping code faster and more efficiently, but it adheres to the principles of production-grade systems.

It offers an "autonomy slider" for web scraping. At one end, it can generate a full spider from a simple prompt, accelerating the initial development process. But the output is always clean, maintainable code that the engineer owns. It doesn't execute AI at runtime. The intelligence is front-loaded in the development process, giving you the benefit without the recurring cost or reliability risk.

This approach is already yielding results. In our own professional services teams—who handle critical data needs for clients where quality is non-negotiable—we are already seeing productivity gains of around 2.5x.

And this is just the beginning. As the tool improves, and as engineers become more confident in its output for specific sites, they can slide the autonomy level up.

Over time, for certain low-risk sites, that engineer might gain enough trust to say, "The last five pull requests have been perfect. Next time, just deploy it automatically." That’s a journey of establishing trust, giving engineers the final say.

An invitation to build the future with us

The Web Scraping Copilot is available today in the VS Code marketplace. It’s a beta release, and we are hungry for your feedback.

We were comfortable enough to do a live demo with it at the Web Data Extract Summit in Austin, Texas, in September 2025, but we know it’s at the early stage of its journey. We received fantastic feedback in our workshops and we are committed to acting on it.

This is our vision: to build tools that respect the realities of production engineering while harnessing the transformative power of AI. By focusing on deterministic quality, integration, and controllable autonomy, we believe we can build the future of web data extraction together.