Webinar

How to launch a large-scale web data extraction project

James Kehoe

30 min read · March 17, 2023

LexisNexis's web scraping journey from concept to iteration to running a large-scale project

From defining our data requirements that best serve the business requirements and selecting the right way to access web data for your project to scraping compliant, high-quality web-extracted data, we've come a long way in this journey to success with web data.

With this final webinar, we will help you connect the dots between all the covered stages and apply your learnings in practice to launch a web data project from conception and iteration to design and execution.

Join our special guests Eric Platow, Senior Director of Data Science at LexisNexis, and Neha Setia Nagpal, Web Data Evangelist at Zyte, discussing LexisNexis's journey to scrape 200k+ websites.

Learn how Eric developed a set of techniques and tools with Zyte to find ways to successfully extract web data from older websites using traditional technologies and new-age websites built with sophisticated tech. Eric will also share a sneak peek of the web scraping process used by LexisNexis to scrape, clean, process, and consume web-extracted data.

How Eric planned and executed a high-end web scraping project for +200 thousand websites on a tight deadline of only 6 months for execution.
How to derive project requirements and project rules from business requirements.
How to deal with legal implications of public web data extraction, changing scope and vendors accordingly.
Which tools were essential for quality assurance and website validation, and can be reused in other projects to date.

Hosted by

Eric Platow - Senior Director of Data Science at LexisNexis
Neha Setia Nagpal - Web Data Evangelist at Zyte

More webinars

Keep watching

All webinars →

Case study

2026 Web Scraping Industry Report by Zyte

A practical walkthrough of the Web Scraping Industry Report 2026, covering how AI, automation, and access controls are reshaping web data collection at scale.

2 min read

Announcement

Master modern unblocking tactics against the latest anti-bot defenses

Learn how to prepare for modern anti-bot systems with advanced unblocking tactics.

2 min read

How To

Scrape, Analyze & Visualize Web Data with Streamlit

Join Hyder Khan | Data Engineer, @ Flipdish as he shares how to extract, clean, analyze, and visualize web data using a seamless workflow with Streamlit.

1 min read

Webinar

How to launch a large-scale web data extraction project

James Kehoe

30 min read · March 17, 2023

Watch now

LexisNexis's web scraping journey from concept to iteration to running a large-scale project

Join our special guests Eric Platow, Senior Director of Data Science at LexisNexis, and Neha Setia Nagpal, Web Data Evangelist at Zyte, discussing LexisNexis's journey to scrape 200k+ websites.

How Eric planned and executed a high-end web scraping project for +200 thousand websites on a tight deadline of only 6 months for execution.
How to derive project requirements and project rules from business requirements.
How to deal with legal implications of public web data extraction, changing scope and vendors accordingly.
Which tools were essential for quality assurance and website validation, and can be reused in other projects to date.

Hosted by

Eric Platow - Senior Director of Data Science at LexisNexis
Neha Setia Nagpal - Web Data Evangelist at Zyte

More webinars

Keep watching

All webinars →

Case study

2026 Web Scraping Industry Report by Zyte

A practical walkthrough of the Web Scraping Industry Report 2026, covering how AI, automation, and access controls are reshaping web data collection at scale.

2 min read

Announcement

Master modern unblocking tactics against the latest anti-bot defenses

Learn how to prepare for modern anti-bot systems with advanced unblocking tactics.

2 min read

How To

Scrape, Analyze & Visualize Web Data with Streamlit

Join Hyder Khan | Data Engineer, @ Flipdish as he shares how to extract, clean, analyze, and visualize web data using a seamless workflow with Streamlit.

1 min read