Future features
The team has some specific developments in mind. As Evans puts it: “That means enhancing support for modern web technologies (especially JavaScript-heavy sites), improving integrations with headless browsers like Playwright, and continuing to streamline the developer experience – particularly around configuration and observability.”
Execution control
“We are planning enhancements to retry logic, rate limiting, delay handling, etc,” Korobov says.
The changes would improve the way Scrapy deals with failure, speed, and politeness so it can scrape the web more smoothly, more reliably, and more respectfully.
Modernised engine
“We’re rewriting Scrapy core from Twisted to asyncio primitives; the plan is to eventually make the Twisted reactor optional,” says Korobov.
Such a move would help make Scrapy simpler, faster, and more compatible with asynchronous Python code that has become the popular modern go-to.
Better organization
Mikhail Korobov reveals: “We’re exploring different ways to organize the web scraping code, such as page objects (web-poet library), spider templates, etc.
“Scrapy spiders are easy to get started, but having all the code in a single spider class also can get in the way.
“These “new” paradigms will allow developers to make web scraping projects more maintainable in the long run. They also turn out to be a better fit for various AI tools.”