PINGDOM_CHECK

Web Scraping Copilot is live. Build Scrapy spiders 3× faster, free in VS Code.

Install Now
  • Data Services
  • Pricing
  • Login
    Sign up👋 Contact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Home
Blog
The recipe for a request: Scaling data extraction through investigation
Light
Dark
IntroductionThe chef's secretAdopting an investigative mindsetA practical investigation: ‘Go shopping’Common pitfalls vs. winning strategies
×
Subscribe to our Blog
Table of Contents

Cooking a delicious meal for your wife at the end of the night - that's super easy.


Now, try cooking 300 plates every day at a Michelin Star level. It's a completely different ball game.

The same is true when we talk about data extraction requests. Sending a single request so that it gets through without a problem is something anyone can do. But scaling that to over 1,000 requests every second changes the game completely.


At Centric Software, we operate at this level, running over 5,000 scrapers that send 130 million requests daily. At this scale, you simply cannot be fixing things every single day. You have to shift your attention from reactive fixes to perfecting the initial development process.

The chef's secret

I used to be a Michelin-trained chef, and I remember working with a colleague who had an amazing capability. He would walk in with a massive list of tasks, and it would just disappear in seconds. I had no idea how he did it. One day, over a beer, I asked him his secret. He told me, "Just take your time."

It sounds counterintuitive, doesn't it? If I work slower on one task, I'll take longer to get to the next. But eventually, it made sense. Every mistake you make is costly. Every time you rush and have to re-do something, it takes time away from you. But, if you take your time, understand the task, and do it correctly the first time, the net amount of tasks you complete is far greater.


“Every minute you spend in the investigation is 10 times that saved in the implementation”

– Kieron Spearing, Data Collection Engineer, Centric Software


This is one of many lessons from the kitchen that can be directly translated to my work today as a Data Collection Engineer.

Adopting an investigative mindset

To build resilient scrapers that can handle thousands of requests per second, you need to adopt an investigative mindset. This is a methodical process for analyzing how a website works before you write a single line of code.


It can be broken down into three key phases:


  1. Learn how the website expects user interaction.

  2. Break down requests to their minimum requirements.

  3. Translate these discoveries into a resilient scraper.


This process ensures you understand the system deeply. As Albert Einstein said:


“If you can’t explain it simply, you don’t understand it well enough.”

 – Albert Einstein

A practical investigation: ‘Go shopping’

Let's walk through an example. The first step is to simply "Go Shopping." Open the target website in a browser and use it. How is the data represented naturally? How is a user expected to search for and buy a product?


As you interact with the site, your goal is to find where the data is coming from. Using your browser’s developer tools, you can inspect the network traffic and identify the specific API request that fetches the data you need.


Once you’ve located the request, it’s time for experimentation. This is where the fun begins.


  • Take the cURL of the request. Extract the raw request from your browser.

  • Bit by bit, remove components with the intention of breaking the request. Remove headers, cookies, and parameters one by one.

  • Fix it and repeat. When the request fails, you’ve found an essential component. Add it back, document it, and continue removing other parts.


The goal isn't just to get a working cURL command; the goal is to understand what you learn as you go. 


This is why I avoid tools like Postman for this initial investigation, as they can modify the request in subtle ways. A better approach is to use a reverse proxy like mitmproxy, which shows you exactly what is being sent. For performing the investigations and documenting these requests, especially in a team environment, I recommend a tool like Bruno.


During this process, you should be able to answer several key questions:


  • Are cookies required?

  • Are any headers essential?

  • Where are dynamic values generated?

  • Is proxy quality important?

  • Is the proxy tied to the request?

  • Is the header order important?

Common pitfalls vs. winning strategies

This investigative process helps avoid common pitfalls that lead to brittle scrapers and technical debt. By shifting your strategy, you build for resilience and scale from the very beginning.

Pitfalls (The Quick Way)Strategies (The Resilient Way)
HTML-first approach: Scraping data directly from the HTML structure.API-first approach: Finding the underlying API that populates the page. APIs are more stable and less likely to change than front-end layouts.
Gathering the entire feed at once: Creating a single, monolithic process to discover and collect all data.Decouple discovery and collection: Use one process to find all the product URLs and a separate process to collect the data for each URL. This prevents a single failure from stopping the entire operation (avoiding cascading failure) and allows for targeted retries.
Hitting the website without regard to latency: Sending requests as fast as possible without any delays.Retries with Jitter and/or bounded exponential backoff: Be respectful. Implement intelligent delays and backoff strategies to avoid overwhelming the server, which also reduces the chance of getting blocked.
Quick-fix approach: Rushing to get a scraper working to meet a deadline, creating technical debt.Well-documented investigations & sustainable mindset: Take your time. Thoroughly document your findings. This creates a sustainable system that requires far less maintenance in the long run.

By documenting your discoveries, you create a blueprint for a robust scraper. You understand your framework's limitations and can build any necessary tools.


Ultimately, scale comes from building resilient systems.


So, how can we send 1,000 requests per second as easily as we send one? The answer lies in the methodical, investigative process. Every minute you spend in the investigation is 10 times that saved in the implementation.


Because every chef knows that thorough preparation is the key to fine food.

×

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026
Read Time
5 min
Posted on
April 16, 2026
How To
By
Kieron Spearing

Try Zyte API

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more
Start FreeFind out more

The recipe for a request: Scaling data extraction through investigation

Learn how an investigative mindset helps scale data extraction from single requests to millions daily by building resilient, efficient scraping systems.