How bandwidth is counted
We count bytes transferred between the proxy gateway and the target (the upstream leg). That includes:
- The HTTP request headers and body you send.
- The HTTP response headers and body you receive.
- TLS handshake overhead on HTTPS connections.
We do not count the connection between your client and the gateway (the downstream leg). Failed requests that produce no upstream bytes — 407 auth errors, for example — cost nothing.
Redirects count. A 301 redirect to HTTPS is a separate upstream request; a 302 to a different page fetches that page. If your target redirects aggressively, factor in 1–2 extra round-trips per initial URL.
Typical request sizes
| Request type | Typical size | Notes |
|---|---|---|
| Lightweight API call (JSON) | 2–15 KB | REST endpoints, price tickers, status checks |
| SERP result page | 80–200 KB | Google, Bing — HTML only, no images |
| E-commerce product page | 150–400 KB | HTML + inline JSON; varies hugely by site |
| Social media profile | 200–600 KB | Heavy on inline JS and data payloads |
| News / blog article | 50–150 KB | Mostly text; can spike with embeds |
| Full browser page (Playwright) | 1–5 MB | HTML + JS + CSS + images + fonts |
| Image / asset fetch | 10 KB – 2 MB | Avoid unless you actually need the image |
Per-task estimates
Rules of thumb for common scraping jobs. These assume bare HTTP requests (no headless browser) and no image fetching:
| Task | Est. per 1,000 pages | Est. per 1,000,000 pages |
|---|---|---|
| SERP scraping (1 query = 1 page of results) | ~0.1 GB | ~100 GB |
| E-commerce product pages | ~0.3 GB | ~300 GB |
| Social media profiles | ~0.4 GB | ~400 GB |
| Lightweight JSON API calls | ~0.01 GB | ~10 GB |
| Real estate / job listings | ~0.2 GB | ~200 GB |
| Playwright full-page renders | ~2 GB | ~2 TB |
Run a pilot batch of 100 requests, measure the total bytes in your proxy client, then extrapolate. That gives a project-specific number far more accurate than any generic estimate.
Reducing bandwidth spend
Small changes can cut spend significantly:
- Fetch API endpoints, not pages — many e-commerce and social sites expose JSON APIs that return structured data in 5–20 KB instead of full HTML pages at 200–500 KB.
- Block assets in headless browsers — images, fonts, and tracking scripts are the bulk of a full-page load. Block them in Playwright with
routeto cut browser-mode bandwidth by 60–80%. - Accept-Encoding: gzip — most HTTP clients send this automatically. HTML gzip compresses 5–10×. Verify your client is sending this header and the server is responding with
Content-Encoding: gzip. - Fetch only what you need — if you only need the first product on a page, stop reading the response body after you've parsed it. Most HTTP clients support aborting the download mid-stream.
- Cache aggressively on your end — pages you've already scraped don't need to be re-fetched unless the data is time-sensitive. ETag or Last-Modified headers can help, but for most scraping workloads a simple timestamp-based cache is enough.
- Retry fewer times on hopeless targets — retrying a persistent 403 four times costs 4× the bandwidth for zero additional data. Detect quickly and move on.