Ian Kerins
4 Mins
June 19, 2018

A sneak peek inside what hedge funds think of alternative financial data

Unbeknownst to many, there is a data revolution happening in finance.

In their never ending search for alpha hedge funds and investment banks are increasingly turning to new alternative sources of data to give them an informational edge over the market.

On the 31st May, Scrapinghub got the chance to see this revolution first hand. Mills Horton and Thad Chappell of Scrapinghub were invited to Eagle Alpha’s Alternative Data Showcase in New York City, and had some of the leading voices in finance tell them where they think the market is going.

This article will give you a glimpse of what we learned: where this emerging market is headed, the key challenges that data providers will face and how they can take advantage of this opportunity.

Alternative Data In Capital Markets is Exploding

There is no question about it, alternative data is very hot right now.

To get a informational edge hedge funds and other financial institutions are increasingly turning to non-traditional data providers who can give them sources of data that have yet to be commoditized.

During one of the first talks of the day the speakers discussed how more and more hedge funds are beginning to build out internal data science/analyst teams tasked with using these alternative data sources to help them make decisions in areas as diverse as internal recruitment and hiring decisions, to investment strategies when reviewing an asset and or company portfolio.

They stated that as the importance and reliance on data becomes more prevalent, hedge funds and investment banks are increasingly adding Chief Data Officers to their C-Suites.

However, the most striking insight of the conference was the growth in demand these industry leaders expect for alternative sources of financial data. Attendees of the conference identified around ~1500 alternative datasets that they use today, however, they predicted this will grow to 5k by 2020. An increase of over 300% in less than 2 years.

They also estimate that 78% of hedge funds are now using alternative data to inform their investment decisions, and of the 22% that weren’t using alternative data, all expect to start incorporating it into market analysis efforts this coming year.

The Role of Web Extracted Data

The attendees of the conference also made it very clear that web scraped data has a big role to play in this alternative data explosion.

Of the 747 datasets, Eagle Alpha has identified across 24 data categories, 20 of the 24 data categories can be obtained via web scraping (Scrapinghub has experience with all 20). The only categories that can’t are Consumer Transactions, Consumer Credit, Satellite and Weather, and Mobile App Usage.

During 1:1 meetings with hedge funds and investment banks, Mills and Thad discovered that most financial institutions are already scraping the web for alternative financial data and are very keen to partner directly with the web scraping firms who are building datasets for data providers.

These industry experts believe that the web scraping landscape will change significantly over the next few years as more emphasis is placed on data sets and initiatives seeking out data beyond the US – particularly Chinese data.

As a result, there strong interest amongst financial institutions to get direct access to web data, instead of buying off the shelf data that has already been commoditized.

These conversations also highlighted some very interesting and unexpected insights into the challenges hedge funds face when incorporating alternative data into your decision-making processes.

Historical Data More Important Than Real-Time Data

With all the talk of hedge funds using high-frequency trading, Thad and Mills really expected the focus to be on near real-time data.

However, after talking to numerous firms, many outlined how historical data is the most important criteria when adopting alternative data.

To incorporate alternative data into their trading algorithms investment managers first need to demonstrate return on data (ROD). Which requires benchmarking and backtesting their investment thesis with historical data (3 years as a minimum).

This is one of the primary reasons why hedge funds and investment banks have typically purchased only data from the incumbent data providers. Numerous firms said that most data vendors still don’t understand that they need these long trials period (up to a year) to prove to their management team the data is worth the investment and get budget approval.

The biggest barrier to the adoption for hedge funds isn’t obtaining the alternative data, it is the ability to backtest historical data to prove that it will deliver the return on data they require.

This puts many of the pioneering web scraping companies, like Scrapinghub, in the strongest possible position to take advantage of this market opportunity. For it is only the companies that pioneered the large-scale extraction of web data that have been extracting high-quality data long enough to generate the historical datasets these hedge funds need to validate their investment thesis.


The other key challenge financial institutions face when obtaining alternative data is compliance. With the staggering amounts of money involved, any legal compliance issues could cost financial institutions millions (if not billions) of dollars.

As a result, where and how these financial institutions obtain their alternative data is a key requirement that must be satisfied for them to make use of a data feed. This poses challenges when considering the legal basis for extracting financial web data .

Even though the vast majority of hedge funds and investment banks are already scraping websites for useful data, many are worried about their legal compliance requirements when doing so.

This is another huge opportunity for dedicated web scraping providers who have years of experience mitigating compliance issues and ensuring that they extract data in as polite fashion as possible.

Wrapping Things Up

In conclusion, Eagle Alpha’s Alternative Data Showcase was a very valuable conference. Thad Chappell, Sales Executive at Scrapinghub, even went as far to say that Eagle Alpha’s Alternative Data Showcase was the most productive conference he’d ever attended, a sentiment that was shared by numerous other attendees of the event.

“Eagle Alpha’s alternative finance event was the by far the most productive conference I’ve ever attended in terms of 1:1 meetings discussing use cases and experiences around what my company offers and how we can help. All attendees and vendors know why each of us are there, and the 11 minute 1:1's forced vendors and attendees to be direct in what they offer, and if and how that fits into what the firm is looking to achieve at a high level and sometimes very granular projects were discussed.” Thad said.

Eagle Alpha hosts an Alternative Data Showcase every couple months so if you are interested to learn more about how alternative data is being used in finance then be sure to check it out.

If you are interested in scraping the web for alternative financial data or any other web data then be sure to talk to us at Zyte (Formerly Scrapinghub).