Sarah Lang
4 Mins
October 12, 2021

Extract Summit 2021: Highlights and key takeaways

*For information on 2022 Extract Summit visit this link*

It’s a wrap! Last week, for the third time, Extract Summit brought together web data experts and enthusiasts to learn, share and inspire. Sessions, workshops, panels, contests – this year’s summit had so much to offer, I don’t even know where to start.

Extract Summit at a glance

With all the uncertainties still swirling around COVID-19 in 2021, we decided to stay safe and host a virtual event again. Yet, we wanted to offer all attendees and speakers an outstanding experience and the opportunity to connect with each other. Using the event platform Hubilo helped us organize an interactive and fun event. 

“The event interface looks sick! I'm not gonna lie, remote conferences don't feel the same as the real thing because you don't ‘feel there’. I like how Zyte is trying to change that!”, said one of the 2000 participants.

We had a lot of fun connecting with all attendees, especially in those disconnected times. 

1 day, 2 tracks, 24 speakers: Key takeaways

Every year, we try to put together a well-mixed agenda delivered by inspiring web data extraction thought leaders and web scraping experts to give you the best overview of the current web data trends. This year, we had an amazing line-up of great speakers covering many different fields and aspects of web data. Here are a few highlights of the day and you can watch all the recordings here.

A demonstration of the hybrid web scraping approach, adaptive learning, and a sneak-peek into Zyte’s data extraction quality evaluation

Head of Data Science at Zyte, Konstantin Lopukhin, took a deep dive into the data extraction quality evaluation process, talked about what are common pitfalls and even gave insights on how Zyte is handling this to guarantee the highest quality of extracted web data.

Mikhail Korobov, Head of Development for Automatic Extraction at Zyte, guided us through his experiment on how he extracts 20 websites in 3 hours using a hybrid web scraping approach that uses a combination of the classic and fully automated methods. 

Continuing on the automated web scraping backed by machine learning, founder and CTO at Pandio, Joshua Odmark, gave a live demonstration of how adaptive learning with PandioML works. 

Talks about different use cases and lessons learned

We’re always keen to get to know how companies are using web data to thrive. Therefore it’s no surprise that we had a lot of interesting presentations showcasing the usage and importance of data. 

Abhijit HK, CEO at Codewave, shared with us his experience building data dashboards, and some hacks for building web scraping spiders. Niall Hurley, CEO at Eagle Alpha, introduced us to the world of alternative data for finance, explained the customer journey and gave us a few interesting use cases. Linus Nilsson from NilssonHedge showcased his hedge fund database – including input routines, cleaning strategies, and how he ensures it’s high quality.

System Developer at Codemill, Kabir Fahria presented a great case study on the use of web data for contextual advertising.

To give our audience also helpful tips, Eric Platow, Senior Architect at LexisNexis, took us on his journey of taming the world wide web and the lessons learned after scraping 100K.   

Legal hot topics in web data extraction

One all-time-favorite for all of us are the discussions around legal aspects. We had a panel full of experts: Victoria Vlahoyiannis and Kate O-Brien, Legal Counsels at Zyte, Tricia Higgins, the Co-founder and CEO of Fort Privacy, and Nina Fletcher, a Legal Counsel at YipitData.

Together they covered topics around website terms and conditions, when they are legally binding, GDPR in the context of web scraping as well as discussing the recent Van Buren case

Deep dives into anti-bot and headless browsers, an AMA session, and all things technical

As the biggest event within the web data extraction industry, Extract Summit also covers very technical topics. Thanks to our experts from diverse backgrounds, we were able to host an AMA session to answer burning questions about web scraping best practices, anti-ban management and reverse engineering.

Evgeny Slaikovsky, one of our talented reverse engineers, talked about the cat and mouse game of the evolution of anti-bots. Paweł Miech, Senior Technical Team Lead within the development department explained to us what headless browsers are and when we should and shouldn’t use them.

Rain Leander, Technical Evangelist at Cockroach Labs gave an overview about the world of data structure and storage and explored the pros and cons of 3 major types of database available today. 

Ljubica Lazarevic, Developer Advocate at Neo4j, showed us how she built a scraper and used a graph database to recommend conferences to submit talks to – an interesting session not only for our fellow developer advocates!

Scrapy and hands-on coding sessions

Showing us their web scraping skills in action, we had two live coding sessions:

Attila Tóth, Developer Advocate at Timescale guides us step-by-step through how to build a real estate market monitoring tool with Scrapy.

His colleague, Jônatas Paganini, showed us live how he builds a small blog scraper, TimescaleDB.

Live coding contest & other highlights

Besides the amazing talks Extract Summit had to offer, we wanted to give all developers the chance to show off their own web scraping skills, so we hosted a live coding contest. It was a huge fun for all involved! 

Talking about highlights, we definitely have to mention our live comedy show with Damian Clark and Eddie Mullarkey. They gave us some nice giggles and made the break a different kind of experience.

If you want to be a part of Extract Summit 2022, you can pre-register here.