zyte logo
zyte logo
zyte login

A Deep Dive into News Sentiment by Scraping 10,000 Economic and Business Articles.

Following the news lately has been pretty rough, right?

Have you ever stopped to think about how an accumulation of negative economic news can influence our general outlook? We did too, so we did a little data journalism of our own scraping ~10,000 news articles and we think you will interested in what we found.

As a company that lives an breathes data, we wanted to use our own tech to explore this from a quantitative perspective, so we took a deep dive into the sentiment of last year's business news, and explored how social and political factors impacted the economic and business environment. 

It wasn’t hard to notice articles about major economies going into recessions, and the IMF cutting down global economic growth, with high inflation rates also being a recurring topic across the world.

Dive into this analysis measuring news sentiment and the main political/economic factors that drive it - see how economic news articles do more than just report on the state of the economy. They also have the power to shape public opinion and influence politics. 

How we collected and measured news sentiment

As many of our team are based in Republic of Ireland, we focused on news about the economy in Ireland to better understand how people feel about it. Analyzing changes in news sentiment and identifying the political and economic factors that may have contributed to these trends.

  1. How the sentiment in the news have been varying in the last few months
  2. The main political/economic factors which have influenced these trends. 

In this context, we went through over 10,000 articles from four major news sources. 

  1. Focused only on articles about business and the economy, not other topics like sports or entertainment, as we those topics are less likely to impact public opinion about the economy as much. (Are we being baised here? Maybe. Perhaps thats a study for another day.)
  1. We evaluated the sentiment in those business/economic news over the periods of the last 12 months. Scoring them as overall positive, neutral, or negative.
  1. Investigated the key points over this timeline that shows a sudden spikes or drops in the sentiment score. To understand the reason for this sudden change we analyzed the news articles for these short windows on time using word-cloud. 

Finally, we looked to make sense of these observations.

Key Factors Behind News Sentiment 

The sentiment analysis is done from the news headlines rather than the full articles body because the headline tends to give away the overall sentiment of the news, and the headlines have a much higher rate of impressions. We:

  • Only article headlines were used in this analysis.
  • Analyzed sentiment of business news over the last year to see how social and political factors affect economic and business sentiments.
  • We extracted the titles from multiple sources to eliminate bias and increase coverage.
  • Further advanced analysis can be done by considering other attributes of news articles

We used the HuggingFace model for sentiment analysis which gives a score between -1 and 1 for a headline, where -1 represents a highly negative sentiment while +1 is for a highly positive sentiment. 

Examples of the headlines and associated sentiment score:

  • Dominos plans recruitment drive in UK and Ireland (score: 0.9971979856491089)
  • McAfee Ireland sees big jump in revenues as profits also rise (score: 0.9987636804580688)
  • EML warns of growth hit due to possible Central Bank actions (score: -0.9964591860771179)

The 4 steps of news sentiment analysis

The following steps were followed during the analysis:

  • Data collection: Major news websites in Ireland were visited over the period of the last one year. Data from a total of 9925 articles are available to us. Articles are collected from the business/economy section of these websites.
  • Extraction: For each article, we collected data from the headline, date_published using our Zyte Automatic Extraction API and our open source web scraping framework, Scrapy. It would be extremely easy to add the article copy, comments too. 
  • Sentiment timeline: We plot the sentiment score for the collected articles. We first compute a sentiment score for each headline. Then we take an average of the headlines scores over a window of two weeks. Therefore we end up with a sentiment score for two week windows over a period of 12 months. 
  • Investigate key points on the timeline: We analyze the timeline and consider a set of key points where we notice a sudden increase or drop in sentiment. These key points are further investigated by plotting word-cloud for these windows. for longer time period. You may consider automating this by, measuring standard deviations rather than 'eye-balling' the sentiment chart.
  • Data collection: For data collection, first of all, we restrict our analysis to the media houses from Ireland as we believe the reasons for the economic slowdown in different places can be different and we wanted to focus on a particular country to understand the factors better. 

Furthermore, we consider the news articles from four main websites: rte.ie, independent.ie, irishtimes.com and irishexaminer.com. 

Two main reasons for choosing these: 

  1. Among the most prominent media houses in Ireland
  2. They are easy to extract the business news as these have “business” present in the url.  

Visual representation of Irish news sentiment

To take a closer look at changes in sentiment over the last year, we started by creating a visual representation of the sentiment score of news articles throughout the entire year.

The timeline for the news sentiment from the beginning of 2022 to end of 2022 is shown below in Fig. 1. 

You’ll see the sentiment score for two-week periods. The date on the bottom (X-axis) in mm-dd-yy format shows the start of each two-week period. The sentiment score is an average of the sentiment in the headlines during that two-week period. We also show how many articles were included in each two-week period in Fig.2.

Fig.1
Fig. 2

In Fig. 2 we can notice that in the first time window, there are relatively fewer articles (<50) which can lead to biased conclusions and therefore would make sense to ignore it. 

Observing the sentiment score from 01-21-22 it can be seen that year started with a positive sentiment (+0.3) however after a certain point (02-12-22) it dropped to -0.1 and fluctuated from there on. Apart from the initial drop in the sentiment, the trends are not very clear from this chart, therefore we dig a bit deeper into the data to analyze the trends.

Our focus is understanding the reason for the peaks and valleys of the sentiment score on these charts since there appears to be a ripple effect from there on. 

So, we first select the key-points which show a relatively big shift in the sentiment score. 

The key-points considered for the analysis are shown below in Fig. 3 with the red vertical marker. For each of these five key-points we see a relatively unusual sentiment score.

Fig. 3

Next we analyze the article's headlines associated with these key-points separately to understand the reason for this trend. 

To understand the reason behind the trend, we analyzed the main themes in the articles by considering the following:

  • Word clouds for each two-week period to understand the main themes covered in the news articles
  • Only included articles with positive sentiment for peak key-points and negative sentiment for the valley key-points
  • Checked a random sample of headlines for each of these key-points.

This way, we are able to get a better understanding of the reason behind the trend.

The wordcloud and headlines are shown below.

Understanding a sentiment dip with a visual headline analysis

Key-point 1 (02-18-22):

It is understandable that there is a considerable drop in sentiment for this period as it was the week the Russian-Ukraine crisis started. Most of the headlines were associated with Russian and Ukraine mentions and its impact on the economy.

Headlines:

  • What happens when beneficiaries die before they have a chance to inherit?
  • Tech unicorn Flipdish records losses of €2m
  • Russian oligarchs to contest ‘spurious’ EU sanctions
  • Alibaba reports slowest revenue growth since going public as competition bites
  • Russian invasion will slow growth and lead to higher energy prices – EU

Analyzing a positive surge by using headlines

Key-point 2 (04-15-22)

There seems to be relatively positive sentiment around this time due to positive signs in the economy. It is interesting to note that “covid”, “open” is associated with the news headlines which had positive sentiments possibly due to the lifting of restrictions. 

There also seems to be some indications of the overall business doing well with words like “raise”, etc.

Headlines:

  • Last Supper for Jurys Inn as owners rebrand chain as Leonardo
  • John Joyce: Why mysterious giant bull calves are forcing me to reconsider my breeding plans
  • Arrabawn's revenue and earnings rise as debts fall
  • Energy cost inflation must be eased on manufacturing now
  • Ukraine's farmers fight on the front line of global food crisis

Another decline in news sentiment

Key-point 3 (06-10-22)

It seems for this period the narrative in negative news had shifted from the Russian-Ukraine crisis to cost-of-living crisis. As inflation and price are often discussed in the news.

Headlines: 

  • Private health insurers refund State subsidy scheme €262,000
  • Incidence of low paid work in Ireland higher than expected
  • Ireland and Denmark ranked as most expensive EU countries
  • Bank of Ireland’s regular tech crashes leave it on edge of customer badlands
  • Irish tourism sector sees 'unprecedented' level of inflation

Irish Small Business Markets show improved positive news sentiment

Key points 4 (10-28-22)

As we reach the end of our analysis, we're seeing signs of a resurgence in positive sentiment in Irish small business markets.

For the final period it seems there are again signs of positive sentiments overall in small business in Irish markets. Small businesses seem to be receiving positive news as shown in the example headlines.

Headlines:

  • Member states can prioritise gas for fertiliser production - EU
  • Bauer Media Audio agrees deal to buy Cork’s Red FM
  • Markets in Europe advance ahead of interest rate increases
  • Penneys owner pledges no new price rises before autumn 2023 as customers tighten belts
  • EY Ireland plans to grow to 5,100 staff, as global reorganisation looms

Considering different word-clouds it can be said that the year started on a positive note and then there was a significant drop in sentiments with the Russia Ukraine crisis.Then, there seems to be a fluctuating sentiment in the markets which are mostly on the negative side, but the reasons for these vary from the cost-of-living crisis to some positive new for the local businesses.

Conclusion

Putting this study together was incredibly fun and we we're pleasantly surprised by how much you can achieve in such a short time with realtivly easy to use tools. It was never intetned to be an exhaustive research piece but it did show how economic news articles do more than just report on the state of the economy. They also have the power to shape public opinion and influence politics. 

That’s why economic news is important for understanding not just the economy, but also the broader political and social landscape. It is commonly used as a research tool in the fields of social and political science.

In summary, using web data extraction at scale with sentiment analysis we were able to create a fairly simple but effective approach to analyze business and economic news over the past year to understand its impact on social and political factors.

Want to perform an advanced news sentiment analysis?

For this study we used only article headlines, although we believe a more advanced analysis by including other attributes of news articles, like the article body and description.  

For a more advanced analysis – Machine learning (ML) and Artificial Intelligence (AI) techniques and models can be used to extract valuable insights from large amounts of data available on the web, and in this way, provide valuable insights.

Don't let the overwhelming amount of news hold you back, get started with Zyte Automatic Extraction today.

linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram