Introducing the Datasets Catalog: A Treasure Trove of Data

Introducing the Datasets catalog

catal3

Folks using Portia and Scrapy are engaged in a variety of fascinating web crawling projects, so we wanted to provide you with a way to share your data extraction prowess with the world.

With this need in mind, we’re pleased to introduce the latest addition to our Zyte platform: Datasets Catalog!

This new feature allows you to immediately share the results of your Zyte projects as publicly searchable datasets. Not only is this a great way to collaborate with others, you can also save time by using other people’s datasets in your projects.

$datasets\_central\_page$

As fans of the open data movement, we hope that this new feature will ease the process of disseminating data. Open data has been used to help foster transparency in governmental and corporate systems worldwide. Researchers and developers have also benefited from the mutual sharing of information. A couple of our own engineers have even used open data to power transportation apps and to help journalists expose corruption.

Read on to get some ideas on how to use the Datasets Catalog in your workflow.

The Datasets Catalog at a Glance

We are launching the Datasets Catalog with the following features:

Publish the data collected by your Portia or Scrapy spiders/web crawlers as easily accessible datasets
Highlight your scraped data and help others locate the information they need by giving each dataset a name and a description
Let others discover your datasets through search engines like Google
Browse publicly available datasets that other people are sharing.
Choose how to share your dataset using three different privacy settings:
- Public datasets are accessible by anyone (even those without a Zyte account) and are indexed by search engines
- Restricted datasets are accessible only to the users that you explicitly grant access (they need to have a Zyte account)
- Private datasets are accessible only by the members of your organization

How Does it Work?

publish dataset You can find this new “Datasets” option in the menu located at the top navigation bar. On the main Datasets Catalog page, you can browse available datasets along with those that you have recently visited.

Publishing your scraped data into complete datasets takes just one click. This tutorial will get you started on publishing and sharing your extracted data.

Wrap Up

And there you have it, a way to not only showcase your web crawling and data extraction skills, but to also help others with the information that you provide.

We invite you to contribute your datasets and play your part in helping drive the open source movement forward. Reach out to us on Twitter and let us know what datasets you would like to see featured and if you have any recommendations for improving the whole Datasets experience.

We're excited to see what you come up with!