PINGDOM_CHECK

Web Scraping Copilot is live. Build Scrapy spiders 3× faster, free in VS Code.

Install Now
Data Services
Pricing
Login
Try Zyte APIContact Sales
  • Unblocking and Extraction

    Zyte API

    The ultimate API for web scraping. Avoid website bans and access a headless browser or AI Parsing

    Ban Handling

    Headless Browser

    AI Extraction

    Enterprise

    DocumentationSupport

    Hosting and Deployment

    Scrapy Cloud

    Run, monitor, and control your Scrapy spiders however you want to.

    AI-powered IDE Integration

    Web Scraping-Copilot

    The complete, production-ready spider workflow from AI-generated code to cloud deployment. All in VS Code.

  • Data Services
  • Pricing
  • Blog

    Learn

    Case Studies

    Webinars

    Videos

    White Papers

    Join our Community
    Introducing Web Scraping Copilot 1.0: AI-Accelerated web scraping inside VS
    Blog Post
    The seven habits of highly effective data teams
    Blog Post
  • Product and E-commerce

    From e-commerce and online marketplaces

    Data for AI

    Collect and structure web data to feed AI

    Job Posting

    From job boards and recruitment websites

    Real Estate

    From Listings portals and specialist websites

    News and Article

    From online publishers and news websites

    Search

    Search engine results page data (SERP)

    Social Media

    From social media platforms online

  • Meet Zyte

    Our story, people and values

    Contact us

    Get in touch

    Support

    Knowledge base and raise support tickets

    Terms and Policies

    Accept our terms and policies

    Open Source

    Our open source projects and contributions

    Web Data Compliance

    Guidelines and resources for compliant web data collection

    Join the team building the future of web data
    We're Hiring
    Trust Center
    Security, compliance & certifications
Login
Try Zyte APIContact Sales

Zyte Developers

Coding tools & hacks straight to your inbox

Become part of the community and receive a bi-weekly dosage of all things code.

Join us
    • Zyte Data
    • News & Articles
    • Search
    • Social Media
    • Product
    • Data for AI
    • Job Posting
    • Real Estate
    • Zyte API - Ban Handling
    • Zyte API - Headless Browser
    • Zyte API - AI Extraction
    • Web Scraping Copilot
    • Zyte API Enterprise
    • Scrapy Cloud
    • Solution Overview
    • Blog
    • Webinars
    • Case Studies
    • White Papers
    • Documentation
    • Web Scraping Maturity Self-Assesment
    • Web Data compliance
    • Meet Zyte
    • Jobs
    • Terms and Policies
    • Trust Center
    • Support
    • Contact us
    • Pricing
    • Do not sell
    • Cookie settings
    • Sign up
    • Talk to us
    • Cost estimator
Home
Blog
Compliant Web Scraping with AI
Light
Dark

Compliant web scraping with AI

Read Time
6 mins
Posted on
March 15, 2024
Leadership
Zyte’s flagship product, Zyte API, now includes built-in features that automate crawling using spider templates, and our patented AI-powered automated extraction, which gives you quality structured data quickly without writing custom parsing code.
By
Callum Henry
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more
Subscribe to our Blog

Compliant web scraping with AI

DISCLAIMER: This post is for information purposes only. The content is not legal advice and does not create an attorney-client relationship. 


Zyte’s flagship product, Zyte API, now includes built-in features that automate crawling using spider templates, and our patented AI-powered automated extraction, which gives you quality structured data quickly without writing custom parsing code. For scraping product data with AI, this is a complete solution that leverages the product’s automatic extraction feature via a Zyte AI-Powered Spider template which calls Zyte API’s AI models.

AI web scraping with Zyte API

While these tools facilitate efficient web scraping, it is important to keep in mind the basic principles of compliant web scraping. All projects should start with a compliance assessment that considers the key web scraping legal and compliance risk areas as they apply to your project. You can use our Compliant Web Scraping Checklist to help with this. 


In order to help you navigate these issues, we have also integrated a number of compliance focused protections into our AI-powered web scraping solutions. 


Agreement to terms, login and non-public data


If the data you want to extract is not publicly available on the internet — for example, it is behind a paywall, or a login page, or is not generally available to members of the public online — you need to conduct a thorough review of the website terms, or you might need to obtain permission from the website before extracting any data. 


Likewise, if you explicitly agree to any Terms of Service, Terms and Conditions or other policies — for example, by creating an account, by logging into a site, or by clicking ‘ok’ or ‘I agree’ to the site’s terms — you must comply with the policies that you have agreed to. 


While this requires a site-by-site analysis for all projects, in order to protect against some of these risks, Zyte API automatically blocks login for a large number of sites where their Terms of Service prohibit web scraping. This significantly reduces the risk of breaching website terms or policies, as any attempt to access the restricted sites behind a login page will not be permitted by Zyte API. 


Recently, a court in California made a significant ruling dealing with some of these issues in the ongoing litigation between Meta and Bright Data. For our analysis of this ruling, see our blog post: Court Rules Meta's Terms Do Not Prohibit Scraping of Public Data.


Personal data


By now, you should all be familiar with the EU’s General Data Protection Regulation (the GDPR). However, this area is becoming increasingly complex as other countries around the world bring in their own jurisdiction-specific personal data regulations. In particular, we are seeing a number of US state laws coming into effect this year.


It is important to stay on top of these developments to ensure that your project complies with the applicable personal data laws. 


In order to help you remain compliant, we have designed the AI-powered automatic extraction functionality in Zyte API so that it does not extract personal data fields in most cases. This means that, if you are using our smart spiders or our automatic extraction features, you shouldn’t end up with personal data that you weren’t expecting in your dataset. 


Where personal data is included within a schema, it is restricted to publicly available personal data where the lawful basis for that personal data and a balancing of the data subjects’ rights has been considered. For example, if you are scraping articles, the author field is included in the schema but names of commenters to an article are not included. You will still need to conduct your own analysis based on the jurisdiction you are in, but our AI-powered automatic extraction provides a good level of protection against data protection concerns. 


Copyright


One of the first factors to consider when assessing a web scraping project is whether or not the information you are seeking is protected by copyright. By its nature, data on someone else’s website is likely to be owned by them, but not all data is subject to copyright protection. Factual data - for example, a product name and price — is unlikely to be protected by copyright. But a creative or original work - for example, an article or image — is very likely to be protected by copyright. 


If the data you are seeking includes copyrighted material, you need to determine if your use would constitute an infringement of that copyright. If so, you need to assess whether your use falls within an exception. Zyte’s Terms of Service also set out restrictions relating to the external use of web data. By complying with our Terms of Service, you are also more likely to stay on the right side of copyright laws. 


However, the simplest way of dealing with copyrighted material is to descope it from your project. To this end, we have excluded the most common potentially copyrighted data, including image and video downloads, PDF downloads and music downloads from our AI automatic extraction feature. This means that you shouldn’t inadvertently infringe someone’s copyright protection.


Compliance partner for enterprise customers


We have extensive experience in web scraping best practices, with lawyers qualified in three key jurisdictions (US, UK and EU) who review hundreds of web scraping projects each year.


All Zyte API Enterprise customers receive compliance onboarding at the outset of a project. We provide a risk assessment to identify compliance risks and provide customers with information on the best next steps. We work with customers on any adjustments or preparatory work required to ensure compliance and, as customers expand their projects, we continue to work alongside them to help assess and mitigate risks along the way. 


Other risk areas


While there are no specific web scraping laws or regulations which tell you what you can and can’t do, there are a number of key risk areas and associated laws to navigate before commencing a web scraping project. Zyte API has been designed to help mitigate some of these risks, but there are other potential risk areas that it is important to be aware of, and each project needs to be assessed on a case-by-case basis. Most of these are set out in our Compliant Web Scraping Checklist but we always recommend getting independent legal advice.


Zyte has a team of legal and compliance scraping experts who can help guide you on your web scraping compliance journey. Just reach out at legal@zyte.com. 

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more
×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.
Start FreeFind out more

Get the latest posts straight to your inbox

No matter what data type you're looking for, we've got you

G2.com

Capterra.com

Proxyway.com

EWDCI logoMost loved workplace certificateZyte rewardISO 27001 iconG2 rewardG2 rewardG2 reward

© Zyte Group Limited 2026