Unless you’ve been living under a rock for the past few months you know that the EU’s General Data Protection Regulation (GDPR) is upon us.
It is the most comprehensive data protection law ever been introduced, fundamentally changing the way companies can use the personal data of their customers and prospects.
There are countless articles and guides about how GDPR will affect your company’s marketing efforts, lead generation, etc., and the changes you’ll need to make to ensure your company is in full compliance with the law.
But when it comes to web scraping....nothing.
This is strange given that web scraping has traditionally been the backbone of many companies' marketing, lead generation, and market intelligence efforts.
To shed some light on this grey area, I sat down with Sanaea Daruwalla, Head of Legal at Scrapinghub, to get her insights on how Scrapinghub ensures our clients are scraping personal data in a GDPR compliant way.
In this guide I will share with you:
Before we get started though, I want to highlight a quick disclaimer.
Now with the technicalities out of the way, let’s talk about how you should evaluate your web scraping project for GDPR compliance.
This is the very first and most obvious question you should be asking yourself when you are instigating a web scraping project.
The General Data Protection Regulation, or GDPR as it is more commonly known, only applies to personal data. Which is defined as any personally identifiable information (PII) that could be used to directly or indirectly identify a specific individual. Examples of personal data include a person's:
If you aren’t scraping personal data, then GDPR does not apply. However, if you are scraping personal data then move to step 2.
If you are scraping personal data then the next question you need to ask yourself is whether or not you are scraping the personal data of EU citizens or residents (note that the GDPR actually covers the EEA, which includes all EU countries, plus Iceland, Liechtenstein, and Norway, so it’s a bit broader than just the EU).
GDPR is an EEA specific regulation, so it only applies to EU citizens. If you are scraping the personal information of residents of other countries (ex. US, Canada, Australia, etc.) then GDPR may not apply. You just need to comply with the data protection laws in the jurisdiction that you scraping personal data from.
Ok, now we are starting to get into the nuts and bolts of GDPR. We now know we are scraping personal data and there will be EU citizens affected. The next question we need to ask yourselves is:
Do we have a lawful reason to scrape the personal data of these EU citizens?
Under GDPR to use or hold the personal data of any EU citizen a company must comply with one or more of the following legal reasons for storing or using their personal data, otherwise they will be in breach of the regulation. The five types of lawful reasons are:
When a client comes to Scrapinghub looking to scrape the personal data of EU residents we take it on a case by case basis because it is vital that you can prove that you have a lawful reason to scrape that data.
The most common legal reasons in the case of web scraping are legitimate interest and consent.
First, let’s take a look at consent...
For most web scrapers, demonstrating that you have consent from the individual to scrape their personal data will be the main (and often only) method in which you can lawfully scrape the personal data from EU residents.
Prior to the commencement of GDPR, there was a lot of discussion within the web scraping community on whether an EU resident had to implicitly give their consent for companies to scrape their personal data if it was available on public websites (no login required to see the data).
The argument was that by uploading personal data to a public site you are giving consent for that data to be viewed and stored by 3rd parties.
However, after in-depth review of this argument by Sanaea (Head of Legal at Scrapinghub) and external legal experts contracted by Scrapinghub we concluded that this interpretation of the regulations wasn’t compliant with GDPR.
As a result, to scrape the personal data of EU residents you now need to demonstrate that you have the explicit consent of the individual before scraping their personal data.
A lot of web scrapers mightn’t like this position, but after a careful review of all the guidance documents provided by the commission Scrapinghub believes that adopting this policy is the only one that is guaranteed to prevent you and your company falling foul to GDPR.
Obviously, this interpretation of the GDPR regulations will significantly curtail most web scraping projects focused on extraction of the personal information of EU residents for lead generation, market analysis, etc.
However, it will still enable some companies to scrape the personal data of EU citizens if they have obtained their explicit content to do so. An example of this would be companies like Mint.com, where users give Mint consent to log into their online banking accounts and retrieve their banking transactions so that they can be tracked and displayed in a more user friendly format on Mint.com.
Next, we’ll look at using “legitimate interest” as the your lawful reason for scraping the personal data of EU citizens.
The other likely lawful reason available to web scrapers is if they can demonstrate they have a legitimate interest in scraping/storing/using this personal data.
Although this lawful reason is viable for web scrapers, for most companies it will be very difficult for them to demonstrate that they have a legitimate interest in scraping someone's personal data.
In most cases, only governments, law enforcement agencies, etc. will have what would be deemed a to have a legitimate interest in scraping the personal data of its citizens as they will typically be scraping people's personal data for the public good.
As mentioned in Step 3, when a client approaches Scrapinghub looking to scrape the publicly available personal data of EU residents we take it on a case by case basis and work with the client to ensure that this data is being extracted in a GDPR compliant manner.
During this stage not only do we look at the companies lawful reason for scraping personal data we also look at the type of personal data they want to extract, the extent of the proposed data collection and how they plan to use the data post-extraction.
There are a number of reasons for taking this approach:
Under the GDPR regulation, there are certain types of data that are classed as “sensitive” . These include any type of personal data that could indicate a person's:
Scraping sensitive data means that you are subject to additional rules and require specific consent to be given for this data to be scraped and stored. Therefore, unless you have clear explicit consent and legitimate reason to scrape this data you should avoid scraping it.
A important part of GDPR is that companies should only store and process as much data as is required to successfully accomplish a given task.
Given web scrapings ability to extract large quantities of data from a website there is sometimes the desire to capture as much data as possible as it might be useful in the future. Obviously, this mindset isn’t in line with the new GDPR regulations.
As a result, when Scrapinghub is evaluating a scraping project we often work with client companies to minimise the amount of personal data they extract from a website and to define retention periods to ensure they comply with GDPR. You should adopt a similar evaluation process for your own scraping projects to ensure you comply with GDPR’s minimisation requirements.
Even if you can argue that you have a legitimate interest in this data or have the users consent to extract and store their personal data, under GDPR you need to have a clear and legal reason for doing so and be able to demonstrate that it will be used for legitimate business purposes.
If the proposed scraping project doesn’t raise any red flags after being evaluated on these criteria then we will generally commence the scraping project.
As outlined in Step 3, the reason a web scraper is allowed to scrape personal data from a website under GDPR is either because you have their explicit consent or you can demonstrate that you have a legitimate interest in scraping/storing their data.
If consent is withdrawn, or a DSAR is received to delete personal data, then the company who scraped this data must either delete or anonymize this personal data because you no longer have a legal basis to hold it.
Finally, so your web scraping project is just about ready to go but the last thing you need to check off your list is ensuring your proxies are GDPR compliant, specifically any residential proxies you might be using.
As the GDPR regulation defines IP addresses as personally identifiable information you need to ensure that any EU residential IPs you use as proxies are GDPR compliant.
This means that you need to ensure that the owner of that residential IP has given their explicit consent for their home or mobile IP to be used as a web scraping proxy.
If you own your own residential IPs then you will need to handle this consent yourself. However, if you are obtaining residential proxies from a 3rd party provider then you need to ensure that they have obtained consent and are in compliance with GDPR prior to using the proxy for your web scraping project
That is everything you need to know about any future web scraping projects, however, what does GDPR mean for personal data that you may have extracted from websites previously?
Luckily for us you just need to use the same process as outlined above to ensure the GDPR compliance of any old web scraping projects:
GDPR is perhaps the most impactful data protection law ever passed, and it will change the way data is extracted from websites forever.
If you are considering commencing a web scraping project for your business that might extract personal data from public websites and you want to ensure it is GDPR compliant, then don’t hesitate to reach out to us. Our engineering team of 60+ crawl engineers and data scientists can build a custom web scraping solution for your specific needs.
If you're interested in web scraping and interested in joining a 100% team of some of the leading web scraping experts then be sure to check out our jobs page. We're growing fast and need people like you to help turn the web into useful data.