PINGDOM_CHECK
Light
Dark

Scrapy in 2026: New release brings modern async crawling standards

Read Time
6 min
Posted on
January 12, 2026
Scrapy 2.14.0 is released with a major under-the-hood modernization. Say goodbye to Twisted Deferreds.
Table of Content

The world’s most-used open source data extraction framework just rang in the new year with a new release that brings a big structural shift.


If you have been awaiting Scrapy’s full embrace of modern Python async/await patterns, the new version 2.14.0 is the release you have been waiting for.

You might not see flashy new scraping tools in this release. But you will see a framework that is significantly more robust, future-proof, and aligned with modern standards. Think of this as an infrastructure upgrade, one that swaps out aging copper wiring for fiber optics.

The ‘async’ revolution

For years, Scrapy has relied heavily on Twisted’s Deferred objects. While powerful, they predate modern Python’s native async capabilities. In 2.14.0, Scrapy replaces a huge chunk of these internals with native coroutines.


This release introduces AsyncCrawlerProcess and AsyncCrawlerRunner. These are counterparts to the standard runners you know, designed to offer coroutine-based APIs.


What does this mean for you? If you are running Scrapy from a script (common in production pipelines), AsyncCrawlerProcess allows your crawler to play much nicer with other asyncio libraries.


It looks remarkably similar to the CrawlerProcess you are used to. You don't need to rewrite your setup entirely but, under the hood, you are now running on a modernized, coroutine-friendly foundation.

Copy

It is now easier to integrate Scrapy into broader asynchronous applications without fighting against conflicting event loops or legacy Deferred chains.

Smarter scheduling by default

If you run large-scale crawls, you know that managing concurrency is an art. In 2.14.0, the DownloaderAwarePriorityQueue is now the default priority queue.


Previously, Scrapy’s scheduler could be a bit "blind," pushing requests without fully understanding the downloader's current load for specific domains. The new default queue is "downloader aware" - it manages request priorities more intelligently based on the downloader's state.


You don’t need to change a single line of code; your crawls should simply run smoother, with fewer bottlenecks when scraping multiple domains.

Action required: Clean up your spiders

Scrapy is standardizing how spiders are configured, deprecating the use of class attributes for specific download settings in favor of the dictionary-based custom_settings.


The old way: If your spiders define download_timeout or user_agent directly as class attributes, you will start seeing warnings.

Copy

Other improvements

To keep the framework modern, Scrapy 2.14.0 has updated its requirements:


  • Automatic image Rotation: The ImagesPipeline now automatically transposes images based on EXIF data. If you scrape mobile-uploaded content (like real estate or classifieds), this fixes those annoying "sideways" photos automatically.

  • Python 3.9 support dropped: Requirements have been updated. Scrapy 2.14.0 now requires Python 3.10+.


Better custom download handlers: For advanced users building custom protocol handlers, the API has been documented and improved with a new BaseDownloadHandler class, making it easier to extend Scrapy’s core capabilities.

Stack to the future

Scrapy 2.14.0 is about longevity.


By adopting async internals and modernizing the scheduling logic, developers are ensuring Scrapy remains the go-to framework for serious web data extraction in 2026 and beyond.


Check out the full release notes or Scrapy website for more information.

×

Try Zyte API

Zyte proxies and smart browser tech rolled into a single API.