John Campbell
7 Mins
May 13, 2021

Alternative data for hedge funds & portfolio management

2020 & 2021 thus far have not been business as usual. Last year started with a continued run of the longest bull market run in modern economic history, which traces its beginnings to the previous market low post-sub-prime mortgage triggered the financial crisis in early March 2009. The S&P 500 on 9th March 2009 closed at 676.53, while on 19th February 2020, it closed at 3,386.15. That's a 500% growth over a little over a decade; the best run ever for a major index topping the second-best run of 417% experienced in the 1990s.

Then came March 2020 and the once brushed off as a new mild flu-like disease circumscribed to mainland China came knocking on everyone's doorstep. Most if not all financial markets came to an abrupt halt recuperating somewhat since (specific sectors better than others). While this remains true today, the quick transition to a mainly socially distanced and 'virtual/online-first world has exacerbated an ongoing trend seen over the past few years. For example, +160% growth on buy-side spend on alternative data since 2018, topping $1.7Bn in 2020 alone.

In the Alternative Investment Management Association (AIMA) report Casting the Net: How Hedge Funds Are Using Alternative Data, more than half the respondents said they actively used alternative data. Another 14% were exploring options.

While precise definitions of alternative data vary, there can be almost universal agreement that alternative data sourced from places other than the traditional sources. SEC filings, financial performance reports are usually compounded by organizations such as BloombergRefinitiv/Thomson Reuters, internal proprietary portfolio performance data, and other well-known sources. As I discussed in my latest article, Alternative Data is not just Facebook data! Datasets exclusively sourced from social media networks may very well not be of enough quality t and require deft preparation and parsing before applying to the financial services space.

Safe to say that throughout the history of modern capitalism and even in antiquity (curious fact: in ancient Babylon, merchants used Euphrates' depth measurements to inform their commodity prices), businesses have sought to gain an advantage over their competitors. Either by uncovering patterns and trends that weren't readily available through conventional analysis or were not leveraged by competitors. What is different this time around is the so-called data explosion were are currently experiencing. IDC estimates that in 2010 1.2 zettabytes of data were created, but by 2025 this figure will balloon to 175 zettabytes. That and the combination of the advancement of data gathering, processing, and analysis technologies such as very sophisticated data extraction and closely related technologies, knowledge graph technology, natural language processing (NLP), entity resolution, and the continuous advancement of raw compute power by the likes of Dell EMC, HPE, AWS, etc.

As Michael Megaw from SS&C recently put it, 'Alternative data has become a disruptor in the hedge fund industry' placing those organisations that successfully transform to harness it, augment their existing research methodologies by applying it in prime position to reap benefits far greater than the laggards in the industry. 

Word of caution. This wave of enthusiasm might put some on the wrong idea about what this kind of data can do for organizations in the financial services space, be it capital markets, investment management, high street & commercial banking, or insurance. Alternative data does not replace existing datasets or sound quantitative research methodologies. It serves as a means to an end, significantly augmenting and enhancing traditional research methods with more profound, more comprehensive insights.

Based on the market evidenced trends, it all points to continued growth in all aspects of alternative data - data available, sell-side data providers, adoption, and integration into existing workflows -. As the eagerness of just getting 'your hands on this new 'miracle' data wanes and it becomes more pervasive across the entire capital markets and investment management industry, a shift towards data quality above quantity and diversity will be inevitable. The focus on value will, in turn probably lead to a consolidation in terms of players in this space, particularly when data quality issues are costing organizations in excess of $9.7 million a year on average.

All hedge funds are not created equal

First things first, all hedge funds are not created equal. A simple way to tell them apart is to look at the assets under management (AUM) figure to tell apart the big boys from the smaller, more niche type of players.

Common sense dictates that mid and more prominent players, those with $5 billion or more and $10 billion or more under management, respectively, would have more resources at their disposal to harness and productize the 'promise' held by alternative datasets. This seems to be the case as EY, in its latest annual Global Alternative Fund Survey, showed that while 44% of Funds have dedicated FTEs to leverage this onslaught of alt. data as a whole, that figure skyrockets to 60% in the case of funds with over $10 billion AUM. For smaller funds (less than or equal to $2 billion AUM), this figure drops to about 32%. That said, it still shows that smaller players are nimble enough to allocate resources to punch above their weight and subsequently reap the benefits. 

In terms of performance, particularly for alternative investment funds, which are even more prone to leverage alternative data, 58% of investors have come out saying that their managers have met or exceeded their performance expectations during the market volatility that occurred due to the pandemic. 

Top types of alternative data used by hedge funds

There are thousands of alternative datasets out there, pitched by various vendors. One top leading vendor states that they have 1500+ ready-to-consume datasets, so you can visualize how categorizing them becomes all the more important.

That said, most of the alternative data hedge funds are utilizing at least one of the following types of data regularly

  • Web data 🌐
  • Transaction data/Consumer spending 💵
  • Social Media & related sentiment data 🔗
  • App usage 📱
  • Web traffic📈
  • Geo location 📍
  • Satellite imagery 🛰️
  • Email receipt 📧

Generating Alpha & Risk Management

Delivering a return that can solely be attributed to the hedge fund manager's savviness and skill set above all else is the golden grail of generating alpha. To deliver this, fund managers need to identify and leverage an edge. Either spot opportunities that others have missed or underestimated and, vitally, allocate the right amount of weight into their investment portfolio strategy. Tricky, isn't it?

In a recent study by Eagle Alpha, Olga Kokareva from Quantstellation synthesized it brilliantly by showing how the usage of alternative data by hedge fund managers differs significantly depending on if pursuing a so-called fundamental or quantitative approach.

"It's important to understand that usage of alternative data by fundamental hedge fund managers and by quantitative hedge funds are two very different processes. Fundamental hedge fund managers normally use alternative data to reinforce their investment thesis that they derived from their regular research process. For example, a manager can hold a long position in a retailer, and they are thinking about closing it, but they are not sure. So, instead of waiting for the next quarterly report, they can start looking at foot traffic data or credit card data. If the sales numbers are indeed going down, they might close this position earlier." 

On the other hand, quantitive hedge funds derive their investment hypothesis and thesis purely from insights derived from the available data to them, applying more often than not advanced machine learning models. This has been taking place for years, for sure, since the late 1970s and 80s. Therefore leveraging alternative data sets to improve probabilistic models, for example, can only be described as a natural extension of previously occurring phenomena.

But alternative data usage cannot only be circumscribed to the likes of stock selection use cases but also the broader discipline of risk management. After all, from a capital allocation standpoint, investment risk management is something most hedge fund managers are known for. The idea of risk-adjusted returns spearheaded by the now-famous Sharpe ratio developed by William Sharpe, for which he won the Nobel Prize in 1990, has been paramount to sound portfolio management.

The problem, however, still lies in imperfect information, therefore, rendering mathematical constructs like the Sharpe ratio inherently flawed. Although perfect information is empirically impossible, what alternative data promises is to reveal hidden risks that can make a significant difference in risk-reward calculations. Insurance and lending organizations have already started layering alternative data on top of traditional data sets for this exact purpose. Hedge funds are somewhat lagging behind in applying alternative data to manage risk, with only 23% of market leaders using alternative data to help them improve risk management processes.

Risks & Challenges

Being able to efficiently and effectively use alternative data, in other words using it to produce investment or generate operational efficiencies, requires five key components:

  1. Having the adequate human capital
  2. Having the correct infrastructure
  3. Having the correct processes, i.e.: Master Data Management.
  4. Navigating the regulatory environment that governs the collection, usage, and distribution of this data
  5. Being able to demonstrate to investors ROI

I don't intend to drill into each element highlighted above in detail (stay tuned for the next article), but highlight the main points regarding infrastructure, processes, and adequate human capital.

Respondents previously mentioned study by AIMA (49% of 'market leaders' and 54% of the 'rest of the market'), but having the appropriate infrastructure and human capital as the key challenge to delivering on the promise of alternative data.

When drilled into explaining the main components of this challenge, 77% of market leaders & 54% of the rest of the market answered that the biggest challenge is the inability to back-test alternative data. So many of the ever more large datasets just don't go far enough back in time for them to be of any significant contribution to historical-based models.

Another key insight is that over half of respondents within the 'market leaders' segment and more than 60% in the 'rest of market' segment stressed the difficulties of sourcing quality data sets. Quality of data is paramount to not only hedge fund portfolio management performance, but it speaks to a wider issue that stems from the 'data explosion' phenomena that I've alluded to earlier on.

Organizations are now almost drowning in data; getting data is not the problem, deriving actionable insights from it is. To do that, data needs to be clean and insight-ready. The fact that in too many circumstances, it is not is costing close to $10 million on average per year to organizations, according to Gartner. Almost five years ago, IBM estimated that the cost of bad data in the United States alone was a staggering $3 Trillion per year.

Circling back to hedge funds in particular, even after being able to identify the correct alternative data set, the question of master data management comes to the fore. Data governance & stewardship, semantic consistency between databases and systems, permanency risk (how far into the future will datasets be used for), data robustness and consistency (can it be mapped to fixed references such as CUSIPs in the US or SEDOLs in the UK & Ireland; all need to be addressed.

This situation points us to the need for organizations to have the correct amount and type of human capital to make the most of alternative data. The fact that full-time employees dedicated to managing these datasets have jumped by ~450% in the last couple of years is a testament that the race is on between organizations to get the best candidates into their organizations and fast.

Where to from here

In the recent study by AIMA and the recently published EY 2020 Global Alternative Fund Survey is anything to go by, the pervasiveness of alternative data within the hedge fund space is only going to increase in the immediate future.

Although the number of datasets might continue to grow, undoubtedly, there will be a flight for quality sooner rather than later with a consolidation of players. With big hitters such as Bloomberg announcing they are doubling down on alternative data and Refinitiv consolidating their own offering, we can just say one thing: the era of alternative data augmented investment strategies is here for the long run.

How can Zyte help

Here at Zyte, we specialize in delivering either custom data feeds explicitly optimized for you by taking the unstructured data from the web and providing it in a structured format or enabling organizations with their own internal data collection teams through robust, resilient, and always-on infrastructure designed for web data extraction.

We help some of the largest FSI's navigate the complexity of web data extraction for alternative data-related use cases, ensuring compliance standards are met and, equally importantly, ensuring a healthy data pipeline.