
It has been developers’ HTTP library of choice for years. But, when it comes to web data extraction, there are alternatives worth considering.
Stop using Python requests for web scraping: Use these modern modules instead
By Ayan Pahwa, Developer Advocate, Zyte
It has been developers' HTTP library of choice for years. But, when it comes to web data extraction, there are alternatives worth considering.
While the 'Requests' library remains the default choice for many Python developers due to its reliability and extensive documentation, the Python HTTP landscape has evolved considerably.
Modern alternatives now offer significant advantages, including built-in asynchronous support, HTTP/2 compatibility, enhanced performance, and up-to-date TLS handling.
This article introduces and compares three such contemporary clients: HTTPX, curl_cffi, and rnet, detailing their unique features and practical applications.
The problem with Requests for web scraping
It's important to clarify Requests' limitations before proceeding; for simple API interactions with well-behaved endpoints, it still remains the de facto standard.
However, a major drawback of the Requests library when it comes to web scraping is its predictable HTTP client fingerprint. This fingerprint, a unique combination of TLS version, cipher suites, HTTP headers, and connection characteristics, is sent with every request, and is well-known and cataloged by anti-bot systems.
Consequently, if you're interacting with any endpoint, including APIs or services protected by anti-ban vendors, your request can be blocked purely based on how the requests library identifies itself. This happens even before your credentials or payload are scrutinized, highlighting a significant limitation when targeting systems that perform client-side validation.
In addition to issues like fingerprinting, a major limitation of the requests library is its lack of native asynchronous support. This absence of async capability is particularly problematic when handling workloads that involve numerous HTTP requests. Without it, the calls execute sequentially, and the program's thread remains blocked for the entire duration of each individual request.
For straightforward scenarios, the standard requests API call remains perfectly functional, as demonstrated in a quick example.
import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/posts/1",
timeout=10,
)
response.raise_for_status()
data = response.json()
print(data["title"])
Clean and simple. For a one-off call to a standard REST API, this is fine. The gaps start showing when you need concurrency, HTTP/2, or when the target endpoint does any kind of client validation.
Install the Alternatives
pip install httpx or uv add httpx
pip install curl-cffi or uv add curl-cffi
pip install rnet or uv add rnet && uv add asyncio
1. HTTPX
HTTPX is the most direct upgrade from Requests as the API is nearly identical. If you know Requests, you already know most of HTTPX. What it adds is first-class async support, HTTP/2, and a more modern internal architecture.
Where it differs from Requests is the explicit use of a Client context manager (strongly recommended over module-level function calls) and the AsyncClient for async usage. This gives you connection pooling and proper resource cleanup by default.
HTTPX is the right starting point if you're looking for a migration that requires minimal code changes.
Example: Sync
import httpx
with httpx.Client(timeout=10.0) as client:
response = client.get("https://jsonplaceholder.typicode.com/posts/1")
response.raise_for_status()
data = response.json()
print(data["title"])
Example: Async (calling the Zyte API)
Async is where HTTPX really earns its keep. Here it's used to fire multiple requests to the Zyte API concurrently, each request blocks on the server side until extraction is complete, but your event loop stays free to send others in parallel:
import os
import asyncio
import httpx
API_KEY = os.environ["ZYTE_API_KEY"]
ENDPOINT = "https://api.zyte.com/v1/extract"
urls = [
"https://example.com",
"https://httpbin.org",
]
async def fetch(client: httpx.AsyncClient, url: str) -> dict:
response = await client.post(
ENDPOINT,
json={"url": url, "browserHtml": True},
auth=(API_KEY, ""),
)
response.raise_for_status()
return response.json()
async def main():
async with httpx.AsyncClient(timeout=60.0) as client:
results = await asyncio.gather(*[fetch(client, url) for url in urls])
for result in results:
print(result["url"], "—", len(result["browserHtml"]), "chars")
asyncio.run(main())
Notes:
raise_for_status()raiseshttpx.HTTPStatusErroron 4xx/5xx responses.- HTTP/2 support requires
pip install httpx[http2]and passinghttp2=Trueto the client. - The 60-second timeout accounts for the Zyte API's server-side blocking behavior — it holds the connection open until extraction completes.
2. curl_cffi
curl_cffi wraps libcurl with Python bindings and adds something HTTPX doesn't have: TLS fingerprint impersonation. It can show the TLS handshake of Chrome, Firefox, Safari, and other browsers. For API calls hitting endpoints protected by anti-ban or similar systems, this can be the difference between getting a response and getting a 403.
The interface closely mirrors Requests, with the addition of the impersonate parameter. It supports both sync and async usage. For most API calls where fingerprinting isn't a concern, curl_cffi behaves just like Requests, the impersonate parameter is opt-in.
Example: Sync
from curl_cffi import requests
response = requests.get(
"https://jsonplaceholder.typicode.com/posts/1",
impersonate="chrome",
timeout=10,
)
response.raise_for_status()
data = response.json()
print(data["title"])
Example: Async (calling the Zyte API)
import os
import asyncio
from curl_cffi.requests import AsyncSession
API_KEY = os.environ["ZYTE_API_KEY"]
ENDPOINT = "https://api.zyte.com/v1/extract"
payload = {
"url": "https://example.com",
"browserHtml": True,
}
async def call_zyte_api():
async with AsyncSession(impersonate="chrome") as session:
response = await session.post(
ENDPOINT,
json=payload,
auth=(API_KEY, ""),
timeout=60,
)
response.raise_for_status()
data = response.json()
print(data["url"], "—", len(data["browserHtml"]), "chars")
asyncio.run(call_zyte_api())
Notes:
impersonate="chrome"sends Chrome's TLS fingerprint on every request made through this session.- Other supported values include
"firefox","safari","chrome110", and more — check thecurl-cffidocs for the full list. - The sync interface (
from curl_cffi import requests) is nearly identical to therequestsmodule, making it the easiest drop-in if you only need sync.
3. rnet
rnet is the newest of the three. Like a lot of modern Python, it's built on Rust, making it async-first and performance-oriented. Like curl_cffi, it supports TLS impersonation, but its primary differentiator is throughput. It is designed for high-concurrency workloads where you're firing many requests simultaneously.
The API surface is different from Requests, so it's not a drop-in replacement. But the patterns are clean and modern, and for async-heavy workloads it's worth the minor adjustment.
Example: Sample library code
import asyncio
from rnet import Impersonate, Client
async def main():
# Build a client
client = Client(impersonate=Impersonate.Firefox139)
# Use the API you're already familiar with
resp = await client.get("https://tls.peet.ws/api/all")
# Print the response
print(await resp.text())
if __name__ == "__main__":
asyncio.run(main())
Notes:
rnetis async-first; sync support is limited.- Response body methods like
.json()and.text()are awaitable. - The Rust core makes it particularly well-suited for high-throughput concurrent workloads.
Comparison Table
| Feature | Requests | HTTPX | curl_cffi | rnet |
|---|---|---|---|---|
| Sync Support | ✅ Yes | ✅ Yes | ✅ Yes | ⚠️ Limited |
| Async support | ❌ No | ✅ Yes | ✅ Yes | ✅ Yes (primary) |
| HTTP/2 | ❌ No | ✅ With extra dependencies | ✅ Via libcurl | ✅ Built-in |
| Performance | Baseline | Good | Good–High | High |
| TLS changes | ❌ No | ❌ No | ✅ Yes | ✅ Yes |
When to use which
Use Requests for simple, one-off scripts, internal tooling, or any situation where you're hitting a cooperative API endpoint and don't need concurrency. Nothing wrong with it in that context.
Use HTTPX when you need async, want the closest migration path from Requests, or need HTTP/2. It's the safest default upgrade for most projects.
Use curl_cffi when TLS fingerprint control matters, whether that's because you're hitting an anti-ban wall or an API with strict client validation, or any service that checks how a client identifies itself at the TLS layer.
Use rnet when raw async performance is the priority. Its Rust foundation makes it the strongest choice for high-concurrency workloads where you're firing many requests simultaneously and need low overhead.
The optimal choice is determined by several factors: your concurrency requirements, the target endpoint's sensitivity to client identification, and the desired similarity between the new code and your existing requests implementation.