One common misconception about scraping personal data is that public personal data does not fall under the GDPR. Many businesses assume that because the data has already been made public on another website that it is fair game to scrape. In actuality, GDPR makes no blanket exceptions for public personal data and the same analysis for any other personal data must be conducted prior to scraping public personal data as well (see our previous posts on GDPR for web scraping and web scraping legal check). It is also worth noting that there are some exemptions under GDPR and the ICO provides a great overview of these exemptions - read them here. In this post, we will focus on public personal data in general, as it comes up frequently as a point of confusion.
Disclaimer: I am not a lawyer, and the recommendations in this guide do not constitute legal advice. Our Head of Legal is a lawyer, but she’s not your lawyer, so none of her opinions or recommendations in this guide constitute legal advice from her to you. The commentary and recommendations outlined below are based on Zyte's (formerly Scrapinghub) experience helping our clients (startups to Fortune 100s) maintain GDPR compliance while scraping billions of web pages each month. If you want assistance with your specific situation then you should consult a lawyer.
A recent decision from the Polish GDPR regulator clearly sets forth the necessity to comply with GDPR even when dealing with public personal data.
In March 2019, the Polish regulator issued a £187,000 fine against a company for scraping public personal data and reusing that data without notifying the data subjects. The company in question is said to have taken personal data on over six million Polish citizens from the country’s Central Electronic Register and Information on Economic Activity. However, it only informed 90,000 of the individuals that it had email addresses for, asserting that “high operational costs” prevented it from doing more. The company attempted to use the argument that there was a disproportionate effort in notifying all the individuals for whom they did not have email addresses, but the Polish regulator did not find that convincing. It should be noted that it’s unclear whether they conducted a full DPIA or not, which is something we always recommend if you are conducting any type of personal data scraping without the data subject’s explicit consent or contractual agreement.
Despite the company’s arguments regarding the disproportionate costs, the Polish regulator found that the company should have used the postal addresses and telephone numbers it had to notify individuals about (1) the data they used, (2) the source of their data, (3) the “purpose and the period of the planned data processing,” and (4) their rights under the GDPR. So the Polish regulator found that even when taking public personal data, and even when the operational burden to notify is high, you still have very strict obligations to the data subjects that you must comply with.
This is a clear signal that there is likely no way around your obligations to notify individuals of your scraping of their public personal data. If you have their email, telephone, physical address, or other means to contact them, you are obliged to provide the requisite notifications. Furthermore, if you are being investigated, ensure that you are clearly taking actions to rectify any issues or you may open yourself up to further unnecessary fines. Finally, if you do decide to take the DPIA route, ensure that it is well documented and that if there is a way to notify the data subjects to do so.
It is really important for Web Scraping companies to stay updated with the rules and regulations around data extraction to remain web compliant. At the Web Data Extraction Summit, we will discuss issues like this and many more so that you can make sure that your scraping process is productive and respectful, so make sure you attend to get best practice tips to ensure you remain compliant.
If you are considering commencing a web scraping project for your business that might extract personal data from public websites and you want to ensure it is GDPR compliant, then don’t hesitate to reach out to us. Our engineering team of 60+ crawl engineers and data scientists can build a custom web scraping solution for your specific needs.