Concurrency limits per pool type
Start here before touching any pool size setting. The binding constraint is almost always target-side tolerance, not gateway capacity.
| Pool type | Typical P50 latency | Recommended starting concurrency | Notes |
|---|---|---|---|
| Datacenter rotating | 60–120 ms | 50–200 concurrent | Fast; push up until target 429s |
| Residential rotating | 180–350 ms | 30–100 concurrent | Need more parallelism to match datacenter RPS |
| Mobile | 200–500 ms | 16–50 concurrent | Bandwidth-priced; fewer retries matter more |
| Static datacenter | 60–120 ms | 20–50 per IP | Fixed IP — target may rate-limit the specific IP |
Python asyncio / aiohttp
aiohttp exposes connection pool limits via TCPConnector. Set limit for the total pool and limit_per_host for per-target concurrency.
import asyncio
import aiohttp
PROXY = "http://USER:[email protected]:8080"
async def fetch(session: aiohttp.ClientSession, url: str) -> str:
async with session.get(url, proxy=PROXY) as resp:
resp.raise_for_status()
return await resp.text()
async def main(urls: list[str]) -> list[str]:
connector = aiohttp.TCPConnector(
limit=100, # total simultaneous connections
limit_per_host=20, # per target domain — tune down for fragile targets
ttl_dns_cache=300, # cache the gateway DNS result for 5 minutes
enable_cleanup_closed=True,
)
timeout = aiohttp.ClientTimeout(total=20)
async with aiohttp.ClientSession(
connector=connector,
timeout=timeout,
) as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
if __name__ == "__main__":
urls = [f"https://target.com/item/{i}" for i in range(500)]
results = asyncio.run(main(urls))
ok = [r for r in results if isinstance(r, str)]
print(f"{len(ok)}/{len(urls)} succeeded")
limit_per_host is per resolved hostname, not per IP. If your target redirects to a CDN hostname, the CDN hostname gets its own limit bucket. Keep this in mind when debugging unexpected slowdowns.Python httpx async
httpx.AsyncClient uses a connection pool internally. Control the pool size via limits:
import asyncio
import httpx
PROXY = "http://USER:[email protected]:8080"
limits = httpx.Limits(
max_connections=100, # total pool size
max_keepalive_connections=40, # idle connections to hold open
keepalive_expiry=30, # seconds before an idle connection closes
)
async def main(urls: list[str]):
async with httpx.AsyncClient(
proxy=PROXY,
limits=limits,
timeout=20,
http2=True, # HTTP/2 multiplexes multiple requests per connection
) as client:
tasks = [client.get(url) for url in urls]
responses = await asyncio.gather(*tasks, return_exceptions=True)
return responses
asyncio.run(main(["https://target.com/"] * 200))
HTTP/2 multiplexing means fewer TCP connections can carry more requests — particularly effective when the target supports it and you have a small number of high-request-volume sessions.
Keep-alive and connection reuse
Every new TCP connection to the proxy gateway costs a round-trip for the TCP handshake plus another for the TLS handshake. At 200 ms latency that's 400 ms of overhead before the first byte of your actual request is sent. Keep-alive eliminates this for subsequent requests.
- aiohttp: connections are kept alive by default. Set
connector_owner=True(the default) so the connector is closed with the session. - httpx: keep-alive is on by default.
keepalive_expirycontrols how long idle connections are held. Don't set it too high or you'll accumulate stale connections. - requests: use a
Session(not barerequests.get) — the session owns the connection pool. Bare calls create and discard connections every time.
requests.Session is not thread-safe. In threaded workers, give each thread its own session. For async code, share a single aiohttp.ClientSession across coroutines — it is designed for concurrent use.Per-domain concurrency caps
When scraping multiple domains in parallel, cap concurrency per domain to avoid hammering one target while another sits idle. A semaphore per domain is the standard pattern:
import asyncio
from collections import defaultdict
from urllib.parse import urlparse
import aiohttp
PROXY = "http://USER:[email protected]:8080"
MAX_PER_DOMAIN = 10
sems: dict[str, asyncio.Semaphore] = defaultdict(
lambda: asyncio.Semaphore(MAX_PER_DOMAIN)
)
async def fetch(session: aiohttp.ClientSession, url: str) -> str:
domain = urlparse(url).netloc
async with sems[domain]:
async with session.get(url, proxy=PROXY) as resp:
return await resp.text()
async def main(urls):
connector = aiohttp.TCPConnector(limit=200, limit_per_host=MAX_PER_DOMAIN)
async with aiohttp.ClientSession(connector=connector) as session:
return await asyncio.gather(
*(fetch(session, u) for u in urls),
return_exceptions=True,
)
For the full system-level view — file descriptors, ephemeral ports, conntrack — see high-throughput tuning.