net/http with a proxy Transport
Build a single *http.Transport with the proxy URL parsed via url.Parse and reuse it for the lifetime of the program. Recreating it per request defeats keep-alive and tanks throughput.
package main
import (
"fmt"
"io"
"net/http"
"net/url"
"time"
)
func newClient() (*http.Client, error) {
proxyURL, err := url.Parse("http://USER:[email protected]:8080")
if err != nil {
return nil, err
}
tr := &http.Transport{
Proxy: http.ProxyURL(proxyURL),
MaxIdleConns: 200,
MaxIdleConnsPerHost: 50,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
ForceAttemptHTTP2: true,
}
return &http.Client{
Transport: tr,
Timeout: 20 * time.Second,
}, nil
}
func main() {
client, err := newClient()
if err != nil { panic(err) }
resp, err := client.Get("https://api.ipify.org")
if err != nil { panic(err) }
defer resp.Body.Close()
body, _ := io.ReadAll(resp.Body)
fmt.Println(resp.StatusCode, string(body))
}
http:// even when the target is HTTPS — net/http handles the CONNECT tunnel automatically.A reusable client
Wrap the transport in a small struct so the rest of the app gets a clean injection point. The same client serves every request from every goroutine.
type ProxyClient struct {
http *http.Client
}
func NewProxyClient(proxyURL string) (*ProxyClient, error) {
u, err := url.Parse(proxyURL)
if err != nil { return nil, err }
return &ProxyClient{
http: &http.Client{
Transport: &http.Transport{
Proxy: http.ProxyURL(u),
MaxIdleConns: 200,
MaxIdleConnsPerHost: 50,
IdleConnTimeout: 90 * time.Second,
ForceAttemptHTTP2: true,
},
Timeout: 20 * time.Second,
},
}, nil
}
func (c *ProxyClient) Do(req *http.Request) (*http.Response, error) {
return c.http.Do(req)
}
Sticky sessions
Append -session-TOKEN to the username inside the proxy URL. Two requests using the same token route through the same exit IP for the session lifetime.
func sessionURL(user, pass, token string) string {
return fmt.Sprintf("http://%s-session-%s:%[email protected]:8080",
user, token, pass)
}
// One-token-per-flow: keep a pool of clients keyed by session token.
type Pool struct {
mu sync.Mutex
clients map[string]*ProxyClient
user string
pass string
}
func (p *Pool) ForToken(token string) *ProxyClient {
p.mu.Lock(); defer p.mu.Unlock()
if c, ok := p.clients[token]; ok { return c }
c, _ := NewProxyClient(sessionURL(p.user, p.pass, token))
p.clients[token] = c
return c
}
Retry with exponential backoff
Most errors are upstream target flakiness. Retry with backoff and jitter, cap the attempt count, and skip retrying on terminal codes (407 / 451).
var retryable = map[int]bool{
429: true, 502: true, 503: true, 504: true, 522: true, 524: true,
}
func (c *ProxyClient) DoWithRetry(req *http.Request, maxAttempts int) (*http.Response, error) {
var resp *http.Response
var err error
for attempt := 0; attempt < maxAttempts; attempt++ {
resp, err = c.http.Do(req.Clone(req.Context()))
if err == nil && !retryable[resp.StatusCode] {
return resp, nil
}
if resp != nil { resp.Body.Close() }
// Exponential backoff with jitter.
wait := time.Duration(math.Pow(2, float64(attempt))) * 200 * time.Millisecond
wait += time.Duration(rand.Int63n(int64(wait / 2)))
if attempt+1 < maxAttempts {
time.Sleep(wait)
}
}
return resp, err
}
See the error reference for which codes are worth retrying. On rotating products each retry typically pulls a fresh exit IP, so a transient block usually clears on the first retry.
Worker-pool concurrency
Bound concurrency with a worker pool — semaphore via channel, fixed-size set of goroutines pulling from a job queue. One transport, many goroutines, no connection storms.
func crawl(ctx context.Context, client *ProxyClient, urls []string, workers int) {
jobs := make(chan string, workers*2)
var wg sync.WaitGroup
for w := 0; w < workers; w++ {
wg.Add(1)
go func() {
defer wg.Done()
for u := range jobs {
req, _ := http.NewRequestWithContext(ctx, "GET", u, nil)
resp, err := client.DoWithRetry(req, 4)
if err != nil {
log.Printf("err %s: %v", u, err); continue
}
io.Copy(io.Discard, resp.Body)
resp.Body.Close()
}
}()
}
for _, u := range urls {
select {
case <-ctx.Done():
close(jobs); wg.Wait(); return
case jobs <- u:
}
}
close(jobs)
wg.Wait()
}
min(50, MaxIdleConnsPerHost) and tune by watching the target's 5xx and 429 rates. Beyond a point you're just hammering the target; see high-throughput tuning for the full discussion.Common pitfalls
- Ignoring response bodies — always close the body, even on errors. Otherwise the connection stays in
CLOSE_WAITand fills the connection pool. - Per-request transport — building
http.Transportinside a request handler kills keep-alive. One transport for the program. - Default
http.DefaultClient— has no timeout. A misbehaving target hangs the goroutine forever. Always sethttp.Client.Timeout. - Forgetting context — pass
context.Contextinto every request. Cancelling the context drains in-flight work cleanly, which matters during deploys and shutdowns. - Reading bodies with
ioutil.ReadAllon huge responses — bound the read withio.LimitReader(body, MAX)or stream-process.