Why your API responses look like gibberish: the gzip decompression trap

The script was working. Requests were going out, responses were coming back with HTTP 200. But the response body was unreadable noise, a wall of binary characters that crashed the JSON parser and reported "no data found." No error code, no timeout, no network failure; just garbage where structured data should be.

The culprit was gzip compression. Specifically, the mismatch between what the HTTP client promised it could handle and what it actually did with the compressed bytes it received.

This is a common trap in web scraping and API clients, and it tends to waste an hour because nothing looks obviously wrong. Here is what is happening, why Python's standard library makes it worse, and how to fix it for good.

What gzip compression is and why APIs use it

gzip is a lossless compression format based on the DEFLATE algorithm. Originally built for compressing files on Unix systems, it became the web's dominant response compression method because the trade-off is excellent: a typical JSON API response compresses to 20% to 30% of its original size, with negligible CPU cost on modern hardware.

For web scraping workloads where you are fetching dozens or hundreds of pages, that bandwidth reduction is meaningful. Compressed responses arrive faster, consume less egress on the server, and allow more concurrent connections to run without hitting network limits. In one real-world parallel-fetching scenario, keeping gzip enabled cut total wall-clock time by roughly 60% compared to uncompressed sequential fetches.

How HTTP compression negotiation works

HTTP compression uses a two-header handshake:

Accept-Encoding is sent by the client in the request. It declares which compression formats the client supports:

1Accept-Encoding: gzip, deflate

Copy

Content-Encoding is sent by the server in the response. It declares which compression format was actually applied to the response body:

1Content-Encoding: gzip

Copy

The contract is: the client advertises its capabilities, the server compresses and labels the response, and the client is responsible for decompressing before reading. The phrase "responsible for decompressing" is where things break.

The urllib problem

Most HTTP clients abstract away this responsibility. curl --compressed handles decompression transparently. Python's requests library decompresses automatically. You never see the compressed bytes.

Python's urllib, however, is lower-level. When you manually set an Accept-Encoding header in a urllib.request call, you are signaling to the library: "I know what I am doing — give me the raw bytes."

And it does exactly that. It sends your header, receives the compressed response, and hands you the compressed binary blob without touching it. The Content-Encoding: gzip header is right there in the response, but urllib does not act on it automatically when you set Accept-Encoding yourself.

The result: your JSON parser receives data starting with the gzip magic bytes \x1f\x8b instead of the { it expects. It fails. You see gibberish, or a json.JSONDecodeError, or a silent "no data found" if your error handling swallows the exception.

This is not a urllib bug — it is intentional behavior. The library assumes that if you set the header yourself, you own the decompression step. The problem is that many scrapers copy request headers from curl or browser dev tools, which include Accept-Encoding: gzip, deflate by default, without realizing they have opted into manual decompression handling.

Why this happens with web scraping APIs

Zyte API is standards-compliant. When your client sends Accept-Encoding: gzip, deflate, Zyte API returns compressed responses as it should. The data is there, fully extracted and structured, just wrapped in gzip. The API is doing nothing wrong. The issue is entirely in the client-side handling.

This is not specific to Zyte API. Any well-implemented HTTP API or web server that supports compression will exhibit this behavior. The same trap appears when scraping any site that enables gzip, calling any REST API that respects Accept-Encoding, or consuming any streaming response from a CDN.

Detecting gzip compression reliably

gzip data always begins with the two-byte sequence 0x1f 0x8b. This magic number gives you a format-level check that is more reliable than parsing the Content-Encoding header, because some servers compress the body but omit or misconfigure the header.

The detection pattern is simple:

python

1raw_bytes = response.read()
2if raw_bytes[:2] == b"\x1f\x8b":
3    raw_bytes = gzip.decompress(raw_bytes)
4body = raw_bytes.decode("utf-8", errors="replace")

Copy

Both gzip and zlib are part of Python's standard library, no additional dependencies needed.

The complete fix with Zyte API

Here is a minimal, working example of a Zyte API call with proper compression handling:

1python
2import urllib.request
3import base64
4import gzip
5import json
6def fetch_from_zyte(url: str, api_key: str) -> dict:
7    auth_string = base64.b64encode(f"{api_key}:".encode()).decode()
8
9    headers = {
10        "Content-Type": "application/json",
11        "Authorization": f"Basic {auth_string}",
12        "Accept-Encoding": "gzip, deflate",
13    }
14
15    payload = json.dumps({"url": url, "product": True}).encode()
16
17    req = urllib.request.Request(
18        "https://api.zyte.com/v1/extract",
19        data=payload,
20        headers=headers,
21        method="POST",
22    )
23
24    with urllib.request.urlopen(req) as resp:
25        raw_bytes = resp.read()
26
27    # Detect and decompress gzip
28    if raw_bytes[:2] == b"\x1f\x8b":
29        raw_bytes = gzip.decompress(raw_bytes)
30
31    return json.loads(raw_bytes.decode("utf-8", errors="replace"))

Copy

Two lines added after resp.read() — that is the entire fix.

Handling deflate, too

If you want a reusable utility that covers both common encodings:

1python
2import gzip
3import zlib
4
5def decompress_response(raw_bytes: bytes) -> bytes:
6    # gzip: magic number 0x1f 0x8b
7    if raw_bytes[:2] == b"\x1f\x8b":
8        return gzip.decompress(raw_bytes)
9
10    # zlib/deflate: common header byte 0x78
11    if raw_bytes[:1] == b"\x78":
12        return zlib.decompress(raw_bytes)
13
14    return raw_bytes

Copy

Call this on any raw response body and it returns decompressed bytes, or the original bytes unchanged if no compression is detected.

When to use `requests` instead

If you are using the requests library, this problem does not arise. Decompression is handled transparently:

1import requests
2
3response = requests.post(
4    "https://api.zyte.com/v1/extract",
5    auth=(api_key, ""),
6    json={"url": url, "product": True},
7)
8
9data = response.json()  # already decompressed

Copy

The case for urllib is zero external dependencies — useful when you are packaging a lightweight script or running in an environment where you cannot install packages. The case for requests is that it handles this (and many other edge cases) for you. Choose based on your constraints, but if you go the urllib route, keep the two-line decompression check in mind.

The diagnostic checklist

If your scraper or API client is returning what looks like binary garbage:

Check whether your response starts with the bytes \x1f\x8b ; that is compressed gzip data
Check whether you are manually setting Accept-Encoding in a low-level HTTP client
Check the response's Content-Encoding header : gzip confirms what happened
Add the two-line magic-byte check and gzip.decompress() call
Do not remove Accept-Encoding from your headers — keep compression enabled for the bandwidth savings

The issue surfaces in any language or framework where you are working close to the HTTP layer: Go's net/http without Transport.DisableCompression, Rust's reqwest in manual mode, Node.js's http module without a decompression middleware. The diagnostic is always the same, check the first two bytes.

Summary

gzip compression cuts HTTP response sizes by 70% to 80%, which makes it worth keeping enabled in any high-volume scraping workload.

The catch is that low-level HTTP clients like Python's urllib hand you the raw compressed bytes when you set Accept-Encoding yourself, and do not decompress automatically.

The fix is to check for the gzip magic number after reading the response body and decompress with gzip.decompress() when it is present.

Two lines of code, no extra dependencies, and your responses go from unreadable noise back to clean, parseable JSON.

Learn more: Zyte API documentation | Zyte API automatic extraction

Why your API responses look like gibberish: the gzip decompression trap

The culprit was gzip compression. Specifically, the mismatch between what the HTTP client promised it could handle and what it actually did with the compressed bytes it received.

What gzip compression is and why APIs use it

How HTTP compression negotiation works

HTTP compression uses a two-header handshake:

Accept-Encoding is sent by the client in the request. It declares which compression formats the client supports:

1Accept-Encoding: gzip, deflate

Copy

Content-Encoding is sent by the server in the response. It declares which compression format was actually applied to the response body:

1Content-Encoding: gzip

Copy

The urllib problem

Why this happens with web scraping APIs

Detecting gzip compression reliably

The detection pattern is simple:

python

1raw_bytes = response.read()
2if raw_bytes[:2] == b"\x1f\x8b":
3    raw_bytes = gzip.decompress(raw_bytes)
4body = raw_bytes.decode("utf-8", errors="replace")

Copy

Both gzip and zlib are part of Python's standard library, no additional dependencies needed.

The complete fix with Zyte API

Here is a minimal, working example of a Zyte API call with proper compression handling:

1python
2import urllib.request
3import base64
4import gzip
5import json
6def fetch_from_zyte(url: str, api_key: str) -> dict:
7    auth_string = base64.b64encode(f"{api_key}:".encode()).decode()
8
9    headers = {
10        "Content-Type": "application/json",
11        "Authorization": f"Basic {auth_string}",
12        "Accept-Encoding": "gzip, deflate",
13    }
14
15    payload = json.dumps({"url": url, "product": True}).encode()
16
17    req = urllib.request.Request(
18        "https://api.zyte.com/v1/extract",
19        data=payload,
20        headers=headers,
21        method="POST",
22    )
23
24    with urllib.request.urlopen(req) as resp:
25        raw_bytes = resp.read()
26
27    # Detect and decompress gzip
28    if raw_bytes[:2] == b"\x1f\x8b":
29        raw_bytes = gzip.decompress(raw_bytes)
30
31    return json.loads(raw_bytes.decode("utf-8", errors="replace"))

Copy

Two lines added after resp.read() — that is the entire fix.

Handling deflate, too

If you want a reusable utility that covers both common encodings:

1python
2import gzip
3import zlib
4
5def decompress_response(raw_bytes: bytes) -> bytes:
6    # gzip: magic number 0x1f 0x8b
7    if raw_bytes[:2] == b"\x1f\x8b":
8        return gzip.decompress(raw_bytes)
9
10    # zlib/deflate: common header byte 0x78
11    if raw_bytes[:1] == b"\x78":
12        return zlib.decompress(raw_bytes)
13
14    return raw_bytes

Copy

Call this on any raw response body and it returns decompressed bytes, or the original bytes unchanged if no compression is detected.

When to use `requests` instead

If you are using the requests library, this problem does not arise. Decompression is handled transparently:

1import requests
2
3response = requests.post(
4    "https://api.zyte.com/v1/extract",
5    auth=(api_key, ""),
6    json={"url": url, "product": True},
7)
8
9data = response.json()  # already decompressed

Copy

The diagnostic checklist

If your scraper or API client is returning what looks like binary garbage:

Check whether your response starts with the bytes \x1f\x8b ; that is compressed gzip data
Check whether you are manually setting Accept-Encoding in a low-level HTTP client
Check the response's Content-Encoding header : gzip confirms what happened
Add the two-line magic-byte check and gzip.decompress() call
Do not remove Accept-Encoding from your headers — keep compression enabled for the bandwidth savings

Summary

gzip compression cuts HTTP response sizes by 70% to 80%, which makes it worth keeping enabled in any high-volume scraping workload.

The catch is that low-level HTTP clients like Python's urllib hand you the raw compressed bytes when you set Accept-Encoding yourself, and do not decompress automatically.

The fix is to check for the gzip magic number after reading the response body and decompress with gzip.decompress() when it is present.

Two lines of code, no extra dependencies, and your responses go from unreadable noise back to clean, parseable JSON.

Learn more: Zyte API documentation | Zyte API automatic extraction

Why your API responses look like gibberish: the gzip decompression trap

What gzip compression is and why APIs use it

How HTTP compression negotiation works

The urllib problem

Why this happens with web scraping APIs

Detecting gzip compression reliably

The complete fix with Zyte API

Handling deflate, too

When to use requests instead

The diagnostic checklist

Summary

Why your API responses look like gibberish: the gzip decompression trap

What gzip compression is and why APIs use it

How HTTP compression negotiation works

The urllib problem

Why this happens with web scraping APIs

Detecting gzip compression reliably

The complete fix with Zyte API

Handling deflate, too

When to use requests instead

The diagnostic checklist

Summary

Why your API responses look like gibberish: the gzip decompression trap

What gzip compression is and why APIs use it

How HTTP compression negotiation works

The urllib problem

Why this happens with web scraping APIs

Detecting gzip compression reliably

The complete fix with Zyte API

Handling deflate, too

When to use requests instead

The diagnostic checklist

Summary

Why your API responses look like gibberish: the gzip decompression trap

What gzip compression is and why APIs use it

How HTTP compression negotiation works

The urllib problem

Why this happens with web scraping APIs

Detecting gzip compression reliably

The complete fix with Zyte API

Handling deflate, too

When to use requests instead

The diagnostic checklist

Summary

When to use `requests` instead

When to use `requests` instead

When to use `requests` instead

When to use `requests` instead