zyte logo
Case Study

Debunk EU scrapes millions of news articles with Zyte

Disinformation expert uses Zyte’s Automatic Extraction service to check authenticity of Baltic region news stories

DebunkEU.org is using Zyte’s Automatic Extraction service to monitor and expose disinformation campaigns spread across media outlets in the Baltic region and further afield. To achieve this Debunk EU is currently scraping news-based websites worldwide in over 40 languages including Russian, Chinese, Iranian, Arabic, German, French, Ukrainian, Georgian, Balkan and Baltic languages. With the help of our easy-to-use Automatic Extraction API - plus friendly technical support from the Zyte team - Debunk EU is scraping around 1.5 million news articles every month from thousands of news sources.

“We’re really happy with the quality of Zyte’s Automatic Extraction. We are also very satisfied by the level of technical support we get. Without Zyte we simply wouldn’t be able to do what we do.”

Girius Merkys, CTO at DebunkEU.org

About Debunk EU

DebunkEU.org is an independently-funded think tank and non-governmental organization that tracks disinformation and misinformation campaigns across media outlets in Baltic countries and Poland, as well as in the United States and North Macedonia.

Its team of over 50 analysts and active volunteers conducts detailed fact-checking and research into disinformation concerns in the Baltic countries and Poland. The think-tank reports on topics including misinformation about COVID-19 and vaccines, political turmoil in Belarus and Russia, and attempts to target NATO activities.

Debunk EU publishes over 100 reports per year, and also runs a programme of educational media literacy campaigns. It also works closely with national institutions in partner countries that provide more valuable insights on the situation in the Baltics.

Learn about DebunkEU.org >>>

Debunk EU's goals

Debunk EU aims to counter disinformation and information campaigns, with the goal of providing insights into complex issues in a concise, understandable and informative way.

Challenges

From 2017, Debunk EU started exploring the options for collecting news articles from various sources. “At that time all the commercial options were really expensive, so we developed our own extraction solution based on Scrapy” explains Debunk EU CTO Girius Merkys. “It was OK, but we had something like 200 domains to monitor and it required a lot of maintenance.”

As time passed, Debunk EU faced the growing challenge of monitoring more and more domains. “Some small countries that we’re interested in might have over a thousand news outlets” states Girius. “In the disinformation space it’s common to see lots of simple Wordpress-based websites controlled by one entity, all running the same story to give the impression that ‘it must be true’”.

Girius also notes that the process of debunking false or misleading content online can be both costly and time consuming. “It’s difficult to fact-check a piece of information if you do not know where to start. What’s more, debunking disinformation costs way more than creating it.”

In parallel with the constantly increasing number of media outlets to monitor, Girius observes that the process of extracting online news articles efficiently is becoming steadily more resource-intensive: “To analyze so much data is quite a challenge. Page designs are also changing more and more frequently, and javascript based sites are becoming more popular. It’s very difficult to scrape that kind of content – sometimes it’s impossible.”

Solution

To deal with the rapidly-growing scale and complexity of extracting millions of news articles, Debunk EU approached Zyte to provide a cost-effective and easy to use automated article extraction solution that would minimize development overheads for the busy Debunk EU team.

With the help of Zyte’s Automatic Extraction API, Debunk EU is able to track the evolution of disinformation campaigns by monitoring over 1.5 million online articles every month.

“As we’ve scaled up we didn’t want the hassle of having to keep maintaining Scrapy” says Girius. “Also, because we are a non-commercial NGO we needed an affordable solution – and that’s something Zyte has been able to offer us, plus technical assistance because of the sheer volume of requests we have every month.”

As well as the quality and reliability of article extraction, Girius also welcomes the efficient support offered by the Zyte team: “We’re very happy with the help we get. Without it we wouldn’t be able to do our work and publish more than 100 reports every year. I really like the article list service. It really just makes everything much easier for us. We just give the link of the domain, then we get the article list and we just scrape it with your API. It’s automatic and it’s really convenient.”

Hitting all the targets

Business results

Extract data like Debunk EU
Sign up for 14-day trial

Data extraction at scale

1.5M
new articles accessed every month

World-wide coverage

42 languages
covered for article extraction

Global reach

6,000 non-credible domains and 60,000 mainstream domains
monitored worldwide
We know web data

Trusted by leading brands

zyte customers

Summary

With help from Zyte’s Automatic Extraction API Debunk EU is able to access millions of news articles every year – with the capacity to grow smoothly as it monitors a greater range of media outlets in more territories.

Transparent pricing, efficient data

Learn about pricing

Debunk EU is able to extract news article on a large scale in 42 languages. Want to do that too?

Check out Automatic Extraction