zyte logo
zyte logo
zyte login

Zyte Developers Community newsletter issue #3

time to read
< 1
Mins
By the one and only
April 28, 2021

Hi there,


If you are not signed up already for the Zyte Developers Community newsletter, you can sign up here.

In this issue:

  • Scrapy 2.5.0 is out
  • Recipe scraping app (with source code)
  • Web scraping in Elixir
  • Easy table scraping with R

Scrapy 2.5.0 is out

The first new Scrapy release of the year is here!

Highlights:
- Official Python 3.9 support
- Experimental HTTP/2 support
- New get_retry_request() function to retry requests from spider callbacks
- New headers_received signal that allows stopping downloads early
- New Response.protocol attribute

Release notes here.

Recipe scraping app

@mango_mero - as part of the #100DaysOfCode challenge -  created an awesome django demo app which scrapes recipe information real-time, using beautifulsoup. Source code is available on Github.

Web scraping in Elixir

If you are using Elixir for web dev, and considering a web scraping project, you might want to check out this framework: Crawly, a high-level web crawling & scraping framework for Elixir. Check out the documentation and the quickstart guide.

Easy table scraping with R

Extracting data from HTML tables can be messy. For one-off jobs though, there's an easy alternative. If you're using R Studio, there's an addin which makes it easy to scrape tables: datapasta. You literally just copy the table from the page, paste it into the tool and you get the data in structured form. Here's a tutorial video.

Written by Attila Toth
Sign up to the blog
linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram