PINGDOM_CHECK
Attila Toth
2 Mins
April 8, 2021

Zyte Developers Community newsletter issue #2

Hi there,

If you are not signed up already for the Zyte Developers Community newsletter, you can sign up here.

In this issue:

  • Web scraping to make police data easily accessible
  • Free & powerful tool for Xpath / CSS selectors
  • Job aggregator site built with web scraping
  • Turn podcasts into an RSS feed with web scraping
  • Scrapy to increase transparency around public meetings

Web scraping to make police data easily accessible

Started with only a Reddit post months ago, The Police Data Accessibility Project has turned into something very real.  The project's goal is to become a trusted complete source for easily downloadable police data. Contributors can help by writing scrapers - among other things.

This project also shows that web scraping can provide huge value in cases where data is publicly available, but not easy to access.

Free & powerful tool for Xpath / CSS selectors

Writing CSS selectors or XPath (or God help even regex) to extract data is necessary to develop spiders. This browser extension, SelectorsHub, can help you come up with working locators quickly and easily. It's free, available for all popular browsers, and even has a community.

Job aggregator site built with web scraping

Another fun project that uses web scraping: remote job aggregator. The site currently lists 30 000+ remote jobs from multiple job posting sites - powered by web scraping.

Turn podcasts into an RSS feed with web scraping

Lacopo Garizio shared his project that turns audio files into an RSS feed using Scrapy. The open-sourced package provides a "Podcast Pipeline" for your Scrapy project that turns the scraped podcast information into an RSS feed. You can also save the content locally or in an S3 bucket. You can then point your podcast player to the URL of the file and listen to its content.

Scrapy to increase transparency around public meetings

City Scrapers is an open-source initiative to help increase access and transparency around public meetings across the U.S. by making it easier for everyone to know when and where public meetings are held. The project is built using Scrapy and all code is open-sourced on Github.