Bandwidth budget calculator — Docs

How bandwidth is counted

We count bytes transferred between the proxy gateway and the target (the upstream leg). That includes:

The HTTP request headers and body you send.
The HTTP response headers and body you receive.
TLS handshake overhead on HTTPS connections.

We do not count the connection between your client and the gateway (the downstream leg). Failed requests that produce no upstream bytes — 407 auth errors, for example — cost nothing.

Redirects count. A 301 redirect to HTTPS is a separate upstream request; a 302 to a different page fetches that page. If your target redirects aggressively, factor in 1–2 extra round-trips per initial URL.

Typical request sizes

Request type	Typical size	Notes
Lightweight API call (JSON)	2–15 KB	REST endpoints, price tickers, status checks
SERP result page	80–200 KB	Google, Bing — HTML only, no images
E-commerce product page	150–400 KB	HTML + inline JSON; varies hugely by site
Social media profile	200–600 KB	Heavy on inline JS and data payloads
News / blog article	50–150 KB	Mostly text; can spike with embeds
Full browser page (Playwright)	1–5 MB	HTML + JS + CSS + images + fonts
Image / asset fetch	10 KB – 2 MB	Avoid unless you actually need the image

Per-task estimates

Rules of thumb for common scraping jobs. These assume bare HTTP requests (no headless browser) and no image fetching:

Task	Est. per 1,000 pages	Est. per 1,000,000 pages
SERP scraping (1 query = 1 page of results)	~0.1 GB	~100 GB
E-commerce product pages	~0.3 GB	~300 GB
Social media profiles	~0.4 GB	~400 GB
Lightweight JSON API calls	~0.01 GB	~10 GB
Real estate / job listings	~0.2 GB	~200 GB
Playwright full-page renders	~2 GB	~2 TB

Run a pilot batch of 100 requests, measure the total bytes in your proxy client, then extrapolate. That gives a project-specific number far more accurate than any generic estimate.

Reducing bandwidth spend

Small changes can cut spend significantly:

Fetch API endpoints, not pages — many e-commerce and social sites expose JSON APIs that return structured data in 5–20 KB instead of full HTML pages at 200–500 KB.
Block assets in headless browsers — images, fonts, and tracking scripts are the bulk of a full-page load. Block them in Playwright with route to cut browser-mode bandwidth by 60–80%.
Accept-Encoding: gzip — most HTTP clients send this automatically. HTML gzip compresses 5–10×. Verify your client is sending this header and the server is responding with Content-Encoding: gzip.
Fetch only what you need — if you only need the first product on a page, stop reading the response body after you've parsed it. Most HTTP clients support aborting the download mid-stream.
Cache aggressively on your end — pages you've already scraped don't need to be re-fetched unless the data is time-sensitive. ETag or Last-Modified headers can help, but for most scraping workloads a simple timestamp-based cache is enough.
Retry fewer times on hopeless targets — retrying a persistent 403 four times costs 4× the bandwidth for zero additional data. Detect quickly and move on.