Folks using Portia and Scrapy are engaged in a variety of fascinating web crawling projects, so we wanted to provide you with a way to share your data extraction prowess with the world.
With this need in mind, we’re pleased to introduce the latest addition to our Scrapinghub platform: the Datasets Catalog!
This new feature allows you to immediately share the results of your Scrapinghub projects as publicly searchable datasets. Not only is this a great way to collaborate with others, you can also save time by using other people’s datasets in your projects.
As fans of the open data movement, we hope that this new feature will ease the process of disseminating data. Open data has been used to help foster transparency in governmental and corporate systems worldwide. Researchers and developers have also benefited from the mutual sharing of information. A couple of our own engineers have even used open data to power transportation apps and to help journalists expose corruption.
Read on to get some ideas on how to use the Datasets Catalog in your workflow.
We are launching the Datasets Catalog with the following features:
You can find this new “Datasets” option in the menu located at the top navigation bar. On the main Datasets Catalog page, you can browse available datasets along with those that you have recently visited.
Publishing your scraped data into complete datasets takes just one click. This tutorial will get you started on publishing and sharing your extracted data.
And there you have it, a way to not only showcase your web crawling and data extraction skills, but to also help others with the information that you provide.
We invite you to contribute your datasets and play your part in helping drive the open source movement forward. Reach out to us on Twitter and let us know what datasets you would like to see featured and if you have any recommendations for improving the whole Datasets experience.
We're excited to see what you come up with!