The best agent skill is the one that says the least

An interview with Konstantin Lopukhin, Zyte's head of R&D, on the scrape skill in Zyte's Claude Code plugin, reliable web data, and why evaluation is still the foundation.

AI agents can now do surprisingly useful things with a URL and a vague instruction. They can inspect a website, generate code, run tests, read errors, and try again, and along the way they can call tools, read documentation, and reach for Claude Skills, MCP servers, APIs, and whatever new acronym appeared on your feed this week.

So I asked Konstantin Lopukhin the question that has been sitting underneath almost every conversation in this series: if the agent can write the scraper now, what does the human still need to understand?

Neha Setia Nagpal and Konstantin Lopukhin on a recorded video call, captioned with two of the interview questions about what the human still needs to understand and what stays foundational.

Konstantin is head of R&D at Zyte, and his route into this work did not begin with agents. It started much earlier, in the Scrapinghub days, with open source, Scrapy, the long migration to Python 3, research projects, and a run of Kaggle competitions. That last part matters more than it first appears. When he talked about Kaggle, the thing he remembered most fondly was not winning a competition with a teammate, but what came after: reducing the winning solution down to roughly a hundred lines of code. The joy was in finding the bare essence.

That same instinct runs through his view of AI agents for web scraping. The question is not how to make the agent take more steps. The question is how to give it the right knowledge, the right tools, the right infrastructure, and the right evaluation loop so that it does fewer wrong things.

From Copilot to skills: giving the agent domain knowledge

Konstantin was careful to clarify something early. He was not the person leading Web Scraping Copilot or Zyte's Claude Code plugin, both of which were built and shipped by other teams. What his team contributed was research: how large language models behave on web scraping tasks, and how the quality of their output can be measured. That distinction reflects how the whole space is evolving, with product work, research work, and agent workflows becoming tightly connected. One team builds the interface, another tests model behavior, another encodes domain knowledge, and another worries about what happens when all of it has to run at scale.

Web Scraping Copilot and Zyte's Claude Code plugin point in the same direction but approach the problem differently. Web Scraping Copilot gives developers a structured product experience inside Visual Studio Code, helping generate Scrapy projects, parsing code, fixtures, and tests. In my earlier conversation with Adrian Chaves about whether you can trust AI-generated scraping code in production, we kept circling back to why that structure matters: generated code is only useful if it stays correct, readable, and maintainable when the website changes.

The plugin for Claude Code is more agentic. As Konstantin explained it, the plugin gives the agent a scrape skill, so a developer can point it at a site and describe the data they want, then let it work through the pages.

/scrape https://example.com/products
# describe the data you want, then let the agent work through the site

The important part is not that the skill contains anything magic. It contains knowledge. A skill tells the agent how to approach a domain, pointing it toward the right infrastructure, the right project structure, the right way to test, and the right tradeoffs to weigh. Konstantin compared it to picking up an ability in a game: once you have it, you can do the same work with higher quality and less wasted effort. That matters in web scraping, where a general-purpose coding agent makes very plausible mistakes. It might try to download HTML with curl, get blocked immediately, and then spend its time inventing elaborate workarounds instead of reaching for an unblocking API. Once it has the HTML, it might grab the first thing that looks like a price and write a selector around that, even when the page exposes the same value through a more stable structure or an API endpoint.

To a general coding agent, the task looks like "write parsing code." To a web scraping developer, the task is much larger, and it covers a set of questions the agent rarely asks on its own:

Can we access the page reliably?
Are we using the least fragile source of the data?
Is the data actually correct?
Can this run again, on a schedule, without supervision?
Will we know when quality drops?
Can the spider be maintained when the site changes?

That is the domain knowledge a skill tries to encode.

Why the output is code, not just data

One of the most useful turns in the conversation was a simple challenge: if an agent can browse and extract data on the spot, why does Zyte's approach still care about producing scraping code at all? Konstantin's answer came down to scale and repeatability. Many Zyte users do not need data once, they need a lot of it, refreshed regularly, from jobs that may run for hours and have to stay polite to the target site while doing it. That world needs scheduling, monitoring, alerts, quality checks, and a way to recover when a site changes or starts blocking. A one-time answer does not survive contact with any of that.

A durable workflow needs something you can run again, which means code, tests, data contracts, and the operational layer around them. This is the principle Iain Lennon, Zyte's chief product officer, describes as partial autonomy: the model does its expensive thinking at development time to produce the extractor, and at runtime the job is mostly deterministic Scrapy doing cheap, repeatable work. The output can include both code and data, but the code is what makes the job finishable, because the first successful extraction is rarely the end of it.

That answer sits inside a pattern the rest of this series keeps returning to. Tomasz Lesiak's warning about data quality was that before you delegate quality assurance to AI, you still have to define what good looks like, because item coverage, field coverage, and field accuracy do not disappear because a language model is in the loop. Julia Medina and Mihaela Popova made a related point from another angle, that the developer's work shifts toward design, evaluation, and deciding which parts of a workflow should be model-based and which should stay deterministic or infrastructure-backed. Konstantin's answer belongs in the same family. The agent can help generate the scraper, and reliable web data still needs the system around it.

What "high quality" means in web scraping

When Konstantin says quality, he is mostly talking about data quality, not code aesthetics. High-quality scraped data conforms to the schema, carries correct values in the correct format, and includes the required fields, so that a price is really a price and a date is really a date and the output can be checked against an agreed expectation. This is where Zyte's research work earns its place, because with fixed datasets and known expected outputs you can run different approaches against the same target and actually measure the result. You can compare models, compare harnesses, and test whether a skill improves the output or whether a newer model can already solve the task without as much hand-holding.

Without that layer, it is easy to fool yourself. A spider can run without crashing and still extract the wrong field. A model can generate a confident selector that points at the wrong part of the page. An output can look perfectly structured and still be semantically wrong. Konstantin called selector invention one of the web scraping equivalents of hallucination, because the model does not always understand where in a large HTML document the correct value should come from, so it produces a plausible-looking path instead. In ordinary language-model usage, hallucination shows up as a made-up fact. In scraping, it shows up as a made-up path to a real-looking value.

Skills should help, but not over-prescribe

One of Konstantin's more counterintuitive points was that as agents improve, some skills may need to shrink. If a skill encodes domain knowledge, less of it sounds like a downgrade, but too much instruction can become a constraint. A skill that insists too firmly on parsing data out of HTML with a particular library can blind the agent to a simpler, more stable option, such as the same data sitting behind an API that returns clean JSON. Choosing the API could reduce load on the site, cut token usage, simplify the parsing, and produce more stable code, and a skill that is too specific can push the agent to spend more time and tokens solving the wrong version of the problem.

So the long-term direction is not bigger and bigger skills, it is better guidance pitched at the right level of abstraction. Some knowledge stays essential. If a spider should be hosted on Scrapy Cloud, or a workflow should use Zyte infrastructure because it leads to better operational outcomes, the agent needs to know that. But the instruction does not have to spell out every low-level detail, and sometimes it can point the agent at documentation or a tool and let the model choose the path. That is a quiet but real shift in how this work is done. Earlier AI workflows leaned on very explicit prompting, this then this then this. Agentic workflows need something closer to good supervision: here is the goal, here are the constraints, here are the tools, and here is how the quality will be judged.

The competitive advantage is responsibility

When I asked Konstantin what he would ask if our roles were reversed, he said he might ask about the competitive advantage of a company like Zyte against general model providers. His answer came down to responsibility. If you point a general coding agent at the live web yourself, you own the result, and with it the access problem, the reliability problem, the latency problem, the quality problem, and the operational problem. A company like Zyte brings domain knowledge and infrastructure built specifically for web scraping: anti-ban expertise, extraction systems backed by AI extraction in Zyte API, hosting, monitoring, quality measurement, and years of experience keeping scraping workflows alive at scale.

This matters because live web data is not a prompt-completion problem. Developers building agents that depend on live data routinely underestimate anti-ban issues, latency, and reliability, and in high-stakes systems teams sometimes run multiple providers or parallel requests just to hold their success rate and response times steady. That is not the part most agent diagrams show. The diagrams show the planner, the browser, the memory, and the final answer, and they rarely show what it takes to get live web data reliably when sites change, block, rate-limit, stall, or quietly return different content. This is where web scraping stops being a demo and becomes infrastructure.

What junior developers should still learn

So what should a junior developer learn if the agent can now write the scraper? Konstantin did not say "nothing." He said close to the opposite, and the line stuck with me.

"A human who understands how things work paired with an agent will run circles around someone using the best agent without that mental model."

Over time, some tasks will become routine enough that agents handle them without review, but that moves the human to harder problems rather than removing the need for knowledge. In fact, Konstantin suggested agents may raise the bar instead of lowering it. A developer used to be able to specialize in writing good Python to a spec, and an agent can often write that code now, so the human still needs the Python and also the larger picture around it: the system, the data contract, the infrastructure, the failure modes, the costs, the evaluation process, and the tradeoffs between them.

That has been the throughline of this whole series. Adrian talked about maintainability and keeping parsing logic simple, Tomasz about defining quality before automating it, Julia and Mihaela about design and fundamentals, and Javier about AI multiplying whatever you already bring to it. Konstantin's version is that the agent changes the work without removing the need to understand the work.

The questions put to Konstantin Lopukhin, from what the human still needs to understand to what stays foundational when everything keeps shifting.

Evaluation stays foundational

Toward the end, I asked him what still feels foundational when everything around us keeps moving: agents, skills, MCP, benchmarks, model releases, pricing, open weights, closed weights, and the constant sense that the ground is shifting. His answer was immediate.

"Evaluation stays foundational."

Whether you are working with machine learning models, language models, agents, or scraping systems, you need good ways to measure quality, so you know when you are improving and when you are only telling yourself a nicer story. That second part is the harder one, because there are so many quiet ways to believe a system is better than it is. A benchmark can be too narrow, a sample too clean, a demo that succeeds once can fail at scale, a model can shine on easy pages and collapse on the long tail, and a spider can pass every syntax check while extracting the wrong thing. Evaluation is the discipline that keeps the story honest.

Konstantin named three inflection points in his own view of the field. The first was the realization around GPT-3 that large language models were a genuine shift. The second was the moment it became clear that models were good enough at writing code, which changed the paradigm from handing data to a model and asking it to extract, toward asking the model to produce code that does the work more cheaply. The third was the arrival of agents that can take on larger tasks by using tools, running code, inspecting results, and iterating. Even across those shifts, the foundation never became "use the newest model." It got sharper instead. Can you define the task? Can you measure the output? Can you detect failure? Can you improve the system without fooling yourself?

The dead end: not enough agency

When I asked about failures, Konstantin mentioned a product that almost launched and ultimately did not. Looking back, he reads it as a dead end rather than a near miss, because it was not flexible enough and did not give enough agency to the developers or the agents using it. That is a useful clue about where this is heading. The future is not a black box that magically returns data from any website, and it is not a pile of brittle scripts generated once and forgotten either. The useful middle is an agentic system with enough structure to be reliable and enough agency to adapt, carrying its own tools, infrastructure, domain knowledge, evaluation, and human responsibility placed at the right points.

The agents will keep getting better, skills may get smaller, models may get cheaper, open weights may shift the economics, and more of the workflow may quietly automate. The work will still need people who can ask better questions of the system, who know when to trust an output and when to load the page again, who understand that a selector can hallucinate too, and who have internalized that the strongest result is not the most confident answer but the one that survives evaluation.

That may be the real shift in web scraping development in the agent era. The job is no longer only to write the spider.

"The job is to design the loop that knows whether the spider is right."

If you want to try the agentic side of this for yourself, the scrape skill ships with Zyte's Claude Code plugin, Web Scraping Copilot brings the same approach into Visual Studio Code, and both sit on top of Zyte API for unblocking and extraction. You can start with a free Zyte API trial and point an agent at your first site. The interesting question is not whether it returns something. It is whether you can tell when it is wrong.