Zyte's CEO Shane Evans shares a 15-year vision for effortless, AI-driven web data extraction and introduces the 2026 Web Scraping Industry Report with 26 actionable insights.
We held the 2022 Web Data Extraction Summit three weeks ago. I wanted to extend a huge thank you to everyone who came, especially our guest speakers, who shared some great insights throughout the day.
We are delighted to announce that Extract Summit 2022 will be returning to an in-person format after two years of being virtual. This time, it’s going to be in London!
We put Zyte’s own Automatic Extraction API head-to-head with a commercial rival - and an open-source alternative - to find out who’s product extraction top dog.
Zyte is participating in Memex, an ambitious DARPA project that tackles the huge challenge of crawling, indexing, and making sense of areas of the Deep Web, that is, web content not being indexed by traditional search engines such as Google, Bing and others.
We’re proud to announce the developer release of Portia, our new open source visual scraping tool based on Scrapy. Check out this video!
This time last year Pablo and I were chatting about the previous year and what to expect in 2013. I noticed that our team had almost doubled in size in the previous year and we wondered could that possibly continue in 2013?
We're excited to introduce Dash, a major update to our scraping platform. This release is the final step in migrating to our new storage back end and contains improvements to almost every part of our infrastructure. In this post I'd like to introduce some of the highlights.
MongoDB was used early on at Zyte to store scraped data because it's convenient. Scraped data is represented as (possibly nested) records which can be serialized to JSON.
We have recently started letting more users into the private beta for our Automatic Extraction. We're receiving a lot of applications following the shutdown of Needlebase and we're increasing our capacity to accommodate these users.