Open source at our heart with Scrapy and friends

Where it all started

Make building spiders a breeze

Scrapy is an open source python framework built specifically for web scraping by Zyte co-founders Pablo Hoffman and Shane Evans. Out of the box, Scrapy spiders are designed to download HTML, parse and process the data and save it in either CSV, JSON or XML file formats.

Robust Web Scraping Capabilities

Powerful open source technology

Scrapy boasts a wide range of built-in extensions and middlewares designed for handling cookies and sessions as well as HTTP features like compression, authentication, caching, user-agents, robots.txt and crawl depth restriction. It is also very easy to extend through the development of custom middlewares or pipelines to your web scraping projects which can give you the specific functionality you require.

Giving you the power of Data Extraction

Scrapy

Scrapy is our open source web crawling framework written in Python. Scrapy is one of the most widely used and highly regarded frameworks of its kind; very powerful yet easy to use.

Splash

Splash is our lightweight, scriptable browser as a service with a HTTP based API.

Spidermon

Spidermon is our battle-tested open source spider monitoring library for Scrapy.

DateParser

DateParser is our library for parsing human-readable dates and times. Supports 18 languages.

Portia

Portia is our tool for building spiders through a friendly, visual user interface. No programming knowledge required.

Eli5

A library for debugging machine learning classifiers and explaining their predictions.

Scrapely

Scrapely is a library for generating parsers for web pages.

ScrapyJS

ScrapyJS is our middleware for Splash, making it easy to use Splash in your Scrapy projects.

Frontera

Frontera is a framework for managing your crawl logic and policies.

Formasaurus

Formasaurus figures out the type of an HTML form using machine learning. Is it a login, search, sign up, password recovery, contact form, etc?

W3lib

W3lib provides a number of useful web-related functions for your web scraping projects.

ScrapyRT

ScrapyRT let’s you reuse your spider’s logic to extract data from web pages through a single HTTP request.

Loginform

Loginform is a library that detects and fills login forms on specified URLs.

Webstruct

Webstruct is our library for building NER systems that work with HTML.

Queuelib

Queuelib lets you create disk-based queues in Python.

Adblockparser

Adblockparser is a library for parsing and matching against Adblock Plus filters.

MDR

MDR is a library for detecting and extracting list data from web pages.

Webpager

Webpager is a library for classifying whether a link on a web page is a pagination link or not.

Skinfer

Skinfer is a tool we developed to infer schemas from a sample of JSON data.

Scrapy-StreamItem

Scrapy-StreamItem provides support for working with streamcorpus’ StreamItems.

Wappalyzer-Python

Wappalyzer-Python is a Python based wrapper for Wappalyzer.

Trusted by leading brands

Start Scraping The Web In Minutes