Skip to content

Async Proxy Client

async_proxy_client

Asynchronous HTTP client for Scrape.do's Proxy Mode.

Defines AsyncScrapeDoProxyClient, the asyncio-native version of ScrapeDoProxyClient. Configures httpx.AsyncClient instances with Scrape.do's Proxy Mode endpoint (proxy.scrape.do:8080) and reuses the same retry / hook / validator semantics as the API-mode async client.

Because Scrape.do encodes per-request parameters into the proxy URL's password field, each unique (token, params) combination needs its own httpx.AsyncClient. This module maintains a bounded LRU pool of clients keyed on the formatted proxy URL — repeated requests with the same params reuse the same TCP / TLS / HTTP-2 connection state. An asyncio.Lock guards the miss/eviction critical section so concurrent coroutines don't race to construct redundant clients.

AsyncScrapeDoProxyClient

Asynchronous HTTP client for Scrape.do's Proxy Mode.

asyncio-native version of ScrapeDoProxyClient, backed by httpx.AsyncClient. Routes requests through proxy.scrape.do:8080 instead of calling api.scrape.do directly. Reuses the same RequestParameters model, the same retry strategy, the same async event hooks (AsyncClientEventHooks), and the same session-validation contract (AsyncSessionValidator).

Features
  • Local API parameter validation via the RequestParameters Pydantic model, plus the proxy-mode-specific cross-checks in validate_proxy_params.

  • Status code error parsing and customisable retry intervals for rate-limited requests. Non-blocking sleeps via await asyncio.sleep(...).

  • Strongly-typed interface for responses via the ScrapeDoResponse Pydantic model.

  • Connection reuse via a bounded LRU pool of httpx.AsyncClient instances keyed on the formatted proxy URL, with an asyncio.Lock guarding the miss/eviction critical section.

Concurrency Limit and Server Errors

This client intercepts and manages Scrape.do's specific gateway errors (429, 502, 510), automatically applying a customisable retry strategy before the error can reach the application.

SDK Event Hooks (event_hooks)

This client reuses the asynchronous AsyncClientEventHooks TypedDict — same shape, same lifecycle, same async-only callback signatures as the API-mode async client.

TLS Verification
  • Scrape.do's Proxy Mode upgrades and forwards HTTPS requests on your behalf (MITM-style), so HTTPS target validation against the normal system CAs would fail

  • The default verify value is DEFAULT_PROXY_SSL_CONTEXT, an ssl.SSLContext preloaded with system CAs + Scrape.do's bundled CA, so HTTPS targets validate correctly through the proxy

  • Pass verify=True if you've installed Scrape.do's CA into your OS keychain, or verify=False to disable validation entirely (discouraged).

Additional httpx.AsyncClient Configuration

The following httpx.AsyncClient parameters can be provided as keyword arguments and will be passed directly to every pooled client.

  • verify
  • cert
  • http1
  • http2
  • timeout
  • limits
  • transport
  • default_encoding

Additionally, the following httpx.AsyncClient.request parameters can be provided as keyword arguments during request execution.

  • timeout (r_timeout)
  • extensions

For more information on their behaviour and default values, please consult the official httpx documentation.

Unsupported HTTPX Client Arguments

The underlying httpx.AsyncClient instances are strictly managed by the pool to prevent invalid configurations from being sent to Scrape.do. For this reason, arguments not listed in the previous section are intentionally blocked and shouldn't be changed.

Connection Pool
  • Each unique formatted proxy URL gets its own httpx.AsyncClient

  • Two requests with the same RequestParameters reuse the same pooled client (and therefore the same TCP / TLS / HTTP-2 state) for transport-level efficiency.

  • Two requests with different parameters get different clients

  • When max_pooled_clients is exceeded, the least-recently-used client is closed.

  • Cookies are not preserved across requests on the pooled client - the jar is cleared after every call. Scrape.do owns the cookie lifecycle through setCookies (in), scrape.do-cookies / pureCookies=true (out), and sessionId (server-side session jars). Pooling is purely a transport concern.

Parameters:

Name Type Description Default
api_token Optional[str]

The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable.

None
max_retries int

The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510).

3
retry_backoff Union[float, Callable[[int], float]]

The strategy used to calculate the delay between retries. Can be a static float (seconds) or a callable that accepts the current attempt number (0-indexed) and returns a float. Defaults to a jittered exponential backoff when set to None.

None
event_hooks Optional[AsyncClientEventHooks]

A dictionary of SDK-native async hooks to execute during different points of the request lifecycle.

None
max_pooled_clients int

Maximum number of httpx.AsyncClient instances to keep alive in the LRU pool. Defaults to 16.

16
verify Union[SSLContext, str, bool]

SSL verification configuration. Defaults to DEFAULT_PROXY_SSL_CONTEXT which trusts both system CAs and Scrape.do's bundled CA.

DEFAULT_PROXY_SSL_CONTEXT
cert Optional[CertTypes]

Client-side certificates for mutual TLS authentication.

None
http1 bool

Enable HTTP/1.1 support.

True
http2 bool

Enable HTTP/2 multiplexing for higher concurrency.

False
timeout TimeoutTypes

The default timeout (in seconds) applied to all network phases. Defaults to 60s.

60.0
limits Limits

Configuration for maximum connection pool sizes within each pooled httpx.AsyncClient.

DEFAULT_LIMITS
transport Optional[AsyncBaseTransport]

A completely custom async transport engine.

None
default_encoding Union[str, Callable[[bytes], str]]

The fallback text encoding used if a target website omits a charset header.

'utf-8'

_get_or_create_client(proxy_url) async

Returns a pooled httpx.AsyncClient for the given proxy URL.

Pool Behavior

Fast path (cache hit)

  • dict lookup + move_to_end don't await, so no other coroutine can preempt. The lock is skipped.

  • move_to_end bumps the entry to the back of the LRU ordering before returning.

Slow path (cache miss)

  • Acquires self._pool_lock and re-checks the pool (double-checked locking).

  • A concurrent coroutine that won the lock first may have populated the entry while we were waiting. If so, return that client.

  • Otherwise evict (if full) and construct a new one.

Why The Lock Matters
  • Two coroutines racing on a cache miss for the same URL would each construct a fresh httpx.AsyncClient.

  • Only one wins the dict write. The other leaks (asyncio GC does not auto-aclose, so the connection state stays open until OS cleanup).

  • The lock serializes creation to prevent this.

Parameters:

Name Type Description Default
proxy_url str

The formatted proxy URL (with api_token inserted and parameters URL-encoded into the password field) returned by RequestParameters.to_proxy_url().format(...).

required

Returns:

Type Description
AsyncClient

A pooled httpx.AsyncClient configured for the given proxy URL.

aclose() async

Closes every pooled httpx.AsyncClient and clears the pool.

It is recommended to use the client as an async context manager to ensure resources are released automatically.

__aenter__() async

Async context manager entry. Returns the AsyncScrapeDoProxyClient instance. The pool starts empty.

Returns:

Type Description
Self

The AsyncScrapeDoProxyClient instance.

__aexit__(exc_type, exc_val, exc_tb) async

Closes every pooled httpx.AsyncClient without swallowing exceptions.

Parameters:

Name Type Description Default
exc_type Optional[type[BaseException]]

The type of the exception.

required
exc_val Optional[BaseException]

The instance of the exception.

required
exc_tb Optional[TracebackType]

The traceback information.

required

Returns:

Type Description
Literal[False]

False, since no exceptions are swallowed.

execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Executes a fully prepared and validated Scrape.do request through Proxy Mode, asynchronously.

Intended Usage

Use this method if you have manually constructed a PreparedScrapeDoRequest object for bulk routing, custom configurations, or task reusability.

Sessions (sessionId)

If you configure a request with a session_id, Scrape.do will attempt to route your traffic through the same proxy address. However, it can still silently rotate this address for various reasons. If it rotates during a multi-step scraping task, any target-specific WAF state or cookies accumulated will be lost, which may cause the task to fail.

Validating Sessions (session_validator)
  • In order to prevent unexpected errors due to dropped sessions, you can pass a custom async function to the client's execute method session_validator argument.

  • This function will be await-ed internally by the client after each stateful request (sessionId is not None) to determine whether or not a RotatedSessionError exception should be raised to signal that this session is no longer valid.

  • The function should take the current request's ScrapeDoResponse object as its only argument, and return Awaitable[bool].

  • If the awaited value is True, this method will raise the RotatedSessionError instead of returning the response object. (The request's ScrapeDoResponse object can still be accessed later on using the exception's response attribute.) Otherwise, no additional action is taken.

Parameters:

Name Type Description Default
request PreparedScrapeDoRequest

The validated request payload.

required
session_validator Optional[AsyncSessionValidator]

A custom async function called to determine whether or not to raise a RotatedSessionError exception.

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions for this specific request.

None

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

ValueError

If the request.api_params RequestParameters instance contains an invalid parameter configuration for Proxy Mode

request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Builds and executes a Scrape.do request through Proxy Mode, asynchronously.

Parameter Configuration

Like AsyncScrapeDoClient.request, this method provides smart routing based on the arguments provided. You can configure the request in two distinct ways:

  • Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as **api_kwargs (render=False, geoCode="us").

  • Pre-built Parameters : Pass a fully validated RequestParameters object via the params argument.

Raw api.scrape.do URLs Are Not Accepted
  • Unlike the API-mode client, proxy mode has no equivalent of a raw api.scrape.do/?... URL.

  • Passing one as target_url simply targets that URL through the Scrape.do proxy, which is almost certainly not what you want.

Parameter Restrictions

To prevent silent overwrites, the client enforces that only one of the parameter configurations can be used at a time. Mixing params with **api_kwargs raises ValueError.

Pre-built Parameters Configuration

When passing an already constructed RequestParameters instance to the params argument, its url attribute will be ignored and replaced by the provided target_url.

Parameters:

Name Type Description Default
method HttpMethod

The HTTP method to forward to the target website.

required
target_url str

The destination website URL.

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function called to determine whether or not to raise a RotatedSessionError exception. See AsyncScrapeDoProxyClient.execute docstring for more information

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward to the target.

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters (e.g., render=False).

{}

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
ValueError

If configuration constraints are violated

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Async wrapper for executing a GET request through Proxy Mode.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL.

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoProxyClient.execute docstring for more information)

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Async wrapper for executing a POST request through Proxy Mode.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL.

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoProxyClient.execute docstring for more information)

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.