This is the foundational element for so much of the innovation happening today, but it’s also where the regulatory story begins.
Innovation: Public data fuels creativity
The value of public web data is undeniable. On the innovation side of the scale, the arguments are clear:
Public web data is the largest data set in the world. The potential is infinite.
Web data can be used for countless business intelligence purposes, driving smarter decisions and creating new opportunities.
AI isn’t going anywhere, and we need good data to train it. Public data is the fuel for this technological revolution.
Fundamentally, we believe that public data should remain public.
Regulation: Logged-out public data capture may be permitted
Historically, the primary legal threat to web scraping came from the Computer Fraud and Abuse Act (CFAA), a US anti-hacking law. This was concerning because violations carried not only civil penalties (money) but also potential criminal penalties.
However, a few years ago, landmark court rulings in cases like LinkedIn Corp. v. hiQ Labs, Inc. and Van Buren v. United States clarified the landscape. The courts stated that if you have lawful access to the data—meaning anyone can go on a public website and see it—you are not violating the CFAA.
So, the question then became: “Can it nevertheless be a violation of a site’s Terms of Service (ToS)?” This year, we saw a major ruling in the Meta v. Bright Data case that answers this question. The court ruled that Bright Data did not violate Meta's ToS.
However, while many headlines declared that all public data scraping is now okay, that's not quite what the case said. The court's decision was specific to the facts: Bright Data was scraping data that was not behind a login and their activity did not violate Meta’s ToS.
Following this, we saw X (formerly Twitter) settle its lawsuit against Bright Data. While the terms are confidential, one can make an educated guess that X saw the outcome of the Meta case and decided it wasn't worth pursuing. The courts are favoring innovation.
Takeaway: Not everything is fair game
Just because the courts have been ruling in favor of scraping public data doesn’t mean it’s all fair game. What you do with the data still matters a lot, and what type of public data matters too. We're seeing courts look more closely at data usage, especially when it involves pirated or illegally obtained content, which leads us to our next topic.