Scrapy is one of the few popular Python packages (almost 10k github stars) that's not yet compatible with Python 3. The team and community around it are working to make it compatible as soon as possible. Here's an overview of what has been happening so far.
You're invited to read along and participate in the porting process from Scrapy's Github repository.
If you asked around you likely heard an answer like "It's based on Twisted, and Twisted is not fully ported yet, you know?". Many blame Twisted, but other things are actually holding back the Scrapy Python 3 port.
When it comes to Twisted, its most important parts are already ported. But if we want Scrapy spiders to download from HTTP urls, we are really going to need a fix or workaround on Twisted's http agent, since it doesn't work on Python 3.
The Scrapy team started to make moves towards Python 3 support two years ago by porting some of the Scrapy dependencies. For about a year now a subset of Scrapy tests is executed under Python 3 on each commit. Apart from Twisted, one bottleneck that was blocking the progress for a while was that most of Scrapy requires Request or Response objects to work - which was recently resolved as described below.
During EuroPython 2015, from 20 to 28 of July, several Scrapy core developers gathered together in Bilbao and were able to make progress on the porting process - meeting for the first time after several years of working together as they did.
There was a Scrapy sprint scheduled to the weekend. Mikhail Korobov, Daniel Graña, and Elias Dorneles teamed up to prepare Scrapy for it by porting Request and Response in advance. This way, it would be easier for other people to join and contribute during the weekend sprint.
In the end, time was short for them to fully port Request and Response before the weekend sprint. Some of the issues they faced were:
To unblock further porting, the team decided to use bytes for HTTP headers and momentarily disable some of the tests for non-ascii URL handling, thus eliminating the two bottlenecks that were holding things back.
The sprint itself was quiet but productive. Some highlights on what developers did there:
In the end the plan worked as expected. After porting Request and Response objects and making some hard decisions, the road to contributions is open.
In the weeks that followed the sprint, developers continued to work on the port. They also got important contributions from the community. As you can see here, Scrapy already got several pull requests merged (for example from @GregoryVigoTorres and @nyov).
Before the sprint there were ~250 tests passing in Python 3. The number is now over 600. These recent advances helped increase our test coverage under Python 3 from 19% to 54%.
Our next major goal is to port the Twisted HTTP client so spiders can actually download something from remote sites.
It's still a long way to the Python 3 support, but when it comes to Python 3 porting Scrapy is in a much better shape now. Join us and contribute to porting Scrapy following these guidelines. We have added a badge to Github to show the progress of Python 3 support. The percentage is calculated using the tests that pass in Python 3 vs the total number of tests available. Currently, 633 tests passing on Python vs 1153 in total.
Thanks for reading!