How to launch a large-scale web data extraction project

LexisNexis's web scraping journey from concept to iteration to running a large-scale project

From defining our data requirements that best serve the business requirements and selecting the right way to access web data for your project to scraping compliant, high-quality web-extracted data, we've come a long way in this journey to success with web data.

With this final webinar, we will help you connect the dots between all the covered stages and apply your learnings in practice to launch a web data project from conception and iteration to design and execution.

Join our special guests Eric Platow, Senior Director of Data Science at LexisNexis, and Neha Setia Nagpal, Web Data Evangelist at Zyte, discussing LexisNexis's journey to scrape 200k+ websites.

Learn how Eric developed a set of techniques and tools with Zyte to find ways to successfully extract web data from older websites using traditional technologies and new-age websites built with sophisticated tech. Eric will also share a sneak peek of the web scraping process used by LexisNexis to scrape, clean, process, and consume web-extracted data.

  • How Eric planned and executed a high-end web scraping project for +200 thousand websites on a tight deadline of only 6 months for execution.

  • How to derive project requirements and project rules from business requirements.

  • How to deal with legal implications of public web data extraction, changing scope and vendors accordingly.

  • Which tools were essential for quality assurance and website validation, and can be reused in other projects to date.

Hosted by

  • Eric Platow - Senior Director of Data Science at LexisNexis

  • Neha Setia Nagpal - Web Data Evangelist at Zyte