Scrapinghub (now Zyte) crawls the deep web
"The easiest way to think about Memex is: How can I make the unseen seen?"
-- Dan Kaufman, director of the innovation office at DARPA
Scrapinghub is participating in Memex, an ambitious DARPA project that tackles the huge challenge of crawling, indexing, and making sense of areas of the Deep Web, that is, web content not being indexed by traditional search engines such as Google, Bing and others. This content, according to current estimations, dwarfs Google’s total indexed content by a ratio of almost 20 to 1. It includes all sorts of criminal activity that, until Memex became available, had proven to be really hard to track down in a systematic way.
The inventor of Memex, Chris White, appeared on 60 Minutes to explain how it works and how it could revolutionize law enforcement investigations:
Scrapinghub will be participating alongside Cloudera, Elephant Scale and Openindex as part of the Hyperion Gray team. We’re delighted to be able to bring our web scraping expertise and open source projects, such as Scrapy, Splash and Crawl Frontier, to a project that has such a positive impact in the real world.
We hope to share more news regarding Memex and Scrapinghub in the coming months!