Async Proxy Client
async_proxy_client
¶
Asynchronous HTTP client for Scrape.do's Proxy Mode.
Defines AsyncScrapeDoProxyClient, the asyncio-native version of
ScrapeDoProxyClient.
Configures httpx.AsyncClient instances with Scrape.do's Proxy Mode
endpoint (proxy.scrape.do:8080) and reuses the same
retry / hook / validator semantics as the API-mode async client.
Because Scrape.do encodes per-request parameters into the proxy URL's
password field, each unique (token, params) combination needs its own
httpx.AsyncClient. This module maintains a bounded LRU pool of clients
keyed on the formatted proxy URL — repeated requests with the same
params reuse the same TCP / TLS / HTTP-2 connection state. An
asyncio.Lock guards the miss/eviction critical section so concurrent
coroutines don't race to construct redundant clients.
AsyncScrapeDoProxyClient
¶
Asynchronous HTTP client for Scrape.do's Proxy Mode.
asyncio-native version of
ScrapeDoProxyClient,
backed by httpx.AsyncClient. Routes requests through
proxy.scrape.do:8080 instead of calling api.scrape.do directly.
Reuses the same
RequestParameters
model, the same retry strategy, the same async event hooks
(AsyncClientEventHooks),
and the same session-validation contract
(AsyncSessionValidator).
Features
-
Local API parameter validation via the
RequestParametersPydantic model, plus the proxy-mode-specific cross-checks invalidate_proxy_params. -
Status code error parsing and customisable retry intervals for rate-limited requests. Non-blocking sleeps via
await asyncio.sleep(...). -
Strongly-typed interface for responses via the
ScrapeDoResponsePydantic model. -
Connection reuse via a bounded LRU pool of
httpx.AsyncClientinstances keyed on the formatted proxy URL, with anasyncio.Lockguarding the miss/eviction critical section.
Concurrency Limit and Server Errors
This client intercepts and manages Scrape.do's specific gateway errors (429, 502, 510), automatically applying a customisable retry strategy before the error can reach the application.
SDK Event Hooks (event_hooks)
This client reuses the asynchronous
AsyncClientEventHooks
TypedDict — same shape, same lifecycle, same async-only callback
signatures as the API-mode async client.
TLS Verification
-
Scrape.do's Proxy Modeupgrades and forwards HTTPS requests on your behalf (MITM-style), so HTTPS target validation against the normal system CAs would fail -
The default
verifyvalue isDEFAULT_PROXY_SSL_CONTEXT, anssl.SSLContextpreloaded with system CAs + Scrape.do's bundled CA, so HTTPS targets validate correctly through the proxy -
Pass
verify=Trueif you've installed Scrape.do's CA into your OS keychain, orverify=Falseto disable validation entirely (discouraged).
Additional httpx.AsyncClient Configuration
The following httpx.AsyncClient parameters can be provided as
keyword arguments and will be passed directly to every pooled
client.
verifycerthttp1http2timeoutlimitstransportdefault_encoding
Additionally, the following httpx.AsyncClient.request parameters
can be provided as keyword arguments during request execution.
timeout(r_timeout)extensions
For more information on their behaviour and default values, please
consult the official
httpx documentation.
Unsupported HTTPX Client Arguments
The underlying httpx.AsyncClient instances are strictly managed
by the pool to prevent invalid configurations from being sent to
Scrape.do. For this reason, arguments not listed in the previous
section are intentionally blocked and shouldn't be changed.
Connection Pool
-
Each unique formatted proxy URL gets its own
httpx.AsyncClient -
Two requests with the same
RequestParametersreuse the same pooled client (and therefore the same TCP / TLS / HTTP-2 state) for transport-level efficiency. -
Two requests with different parameters get different clients
-
When
max_pooled_clientsis exceeded, the least-recently-used client is closed. -
Cookies are not preserved across requests on the pooled client - the jar is cleared after every call. Scrape.do owns the cookie lifecycle through
setCookies(in),scrape.do-cookies/pureCookies=true(out), andsessionId(server-side session jars). Pooling is purely a transport concern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_token
|
Optional[str]
|
The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable. |
None
|
max_retries
|
int
|
The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510). |
3
|
retry_backoff
|
Union[float, Callable[[int], float]]
|
The
strategy used to calculate the delay between retries. Can be
a static |
None
|
event_hooks
|
Optional[AsyncClientEventHooks]
|
A dictionary of SDK-native async hooks to execute during different points of the request lifecycle. |
None
|
max_pooled_clients
|
int
|
Maximum number of |
16
|
verify
|
Union[SSLContext, str, bool]
|
SSL verification
configuration. Defaults to
|
DEFAULT_PROXY_SSL_CONTEXT
|
cert
|
Optional[CertTypes]
|
Client-side certificates for mutual TLS authentication. |
None
|
http1
|
bool
|
Enable HTTP/1.1 support. |
True
|
http2
|
bool
|
Enable HTTP/2 multiplexing for higher concurrency. |
False
|
timeout
|
TimeoutTypes
|
The default timeout (in seconds) applied to all network phases. Defaults to 60s. |
60.0
|
limits
|
Limits
|
Configuration for maximum connection pool sizes
within each pooled |
DEFAULT_LIMITS
|
transport
|
Optional[AsyncBaseTransport]
|
A completely custom async transport engine. |
None
|
default_encoding
|
Union[str, Callable[[bytes], str]]
|
The fallback text encoding used if a target website omits a charset header. |
'utf-8'
|
_get_or_create_client(proxy_url)
async
¶
Returns a pooled httpx.AsyncClient for the given proxy URL.
Pool Behavior
Fast path (cache hit)
-
dict lookup +
move_to_enddon't await, so no other coroutine can preempt. The lock is skipped. -
move_to_endbumps the entry to the back of the LRU ordering before returning.
Slow path (cache miss)
-
Acquires
self._pool_lockand re-checks the pool (double-checked locking). -
A concurrent coroutine that won the lock first may have populated the entry while we were waiting. If so, return that client.
-
Otherwise evict (if full) and construct a new one.
Why The Lock Matters
-
Two coroutines racing on a cache miss for the same URL would each construct a fresh
httpx.AsyncClient. -
Only one wins the dict write. The other leaks (asyncio GC does not auto-
aclose, so the connection state stays open until OS cleanup). -
The lock serializes creation to prevent this.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
proxy_url
|
str
|
The formatted proxy URL (with |
required |
Returns:
| Type | Description |
|---|---|
AsyncClient
|
A pooled |
aclose()
async
¶
Closes every pooled httpx.AsyncClient and clears the pool.
It is recommended to use the client as an async context manager to ensure resources are released automatically.
__aenter__()
async
¶
Async context manager entry. Returns the
AsyncScrapeDoProxyClient instance. The pool starts empty.
Returns:
| Type | Description |
|---|---|
Self
|
The |
__aexit__(exc_type, exc_val, exc_tb)
async
¶
Closes every pooled httpx.AsyncClient without swallowing
exceptions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc_type
|
Optional[type[BaseException]]
|
The type of the exception. |
required |
exc_val
|
Optional[BaseException]
|
The instance of the exception. |
required |
exc_tb
|
Optional[TracebackType]
|
The traceback information. |
required |
Returns:
| Type | Description |
|---|---|
Literal[False]
|
|
execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Executes a fully prepared and validated Scrape.do request
through Proxy Mode, asynchronously.
Intended Usage
Use this method if you have manually constructed a
PreparedScrapeDoRequest object for bulk routing, custom
configurations, or task reusability.
Sessions (sessionId)
If you configure a request with a session_id, Scrape.do will
attempt to route your traffic through the same proxy address.
However, it can still silently rotate this address for various
reasons. If it rotates during a multi-step scraping task, any
target-specific WAF state or cookies accumulated will be lost,
which may cause the task to fail.
Validating Sessions (session_validator)
-
In order to prevent unexpected errors due to dropped sessions, you can pass a custom async function to the client's
executemethodsession_validatorargument. -
This function will be
await-ed internally by the client after each stateful request (sessionId is not None) to determine whether or not aRotatedSessionErrorexception should be raised to signal that this session is no longer valid. -
The function should take the current request's
ScrapeDoResponseobject as its only argument, and returnAwaitable[bool]. -
If the awaited value is
True, this method will raise theRotatedSessionErrorinstead of returning the response object. (The request'sScrapeDoResponseobject can still be accessed later on using the exception'sresponseattribute.) Otherwise, no additional action is taken.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
PreparedScrapeDoRequest
|
The validated request payload. |
required |
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function called to determine whether or not to raise
a |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions for this specific request. |
None
|
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
Raises:
| Type | Description |
|---|---|
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
ValueError
|
If the |
request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Builds and executes a Scrape.do request through Proxy Mode,
asynchronously.
Parameter Configuration
Like
AsyncScrapeDoClient.request,
this method provides smart routing based on the arguments
provided. You can configure the request in two distinct ways:
-
Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as
**api_kwargs(render=False,geoCode="us"). -
Pre-built Parameters : Pass a fully validated
RequestParametersobject via theparamsargument.
Raw api.scrape.do URLs Are Not Accepted
-
Unlike the API-mode client, proxy mode has no equivalent of a raw
api.scrape.do/?...URL. -
Passing one as
target_urlsimply targets that URL through the Scrape.do proxy, which is almost certainly not what you want.
Parameter Restrictions
To prevent silent overwrites, the client enforces that only
one of the parameter configurations can be used at a time.
Mixing params with **api_kwargs raises ValueError.
Pre-built Parameters Configuration
When passing an already constructed RequestParameters
instance to the params argument, its url attribute will
be ignored and replaced by the provided target_url.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
HttpMethod
|
The HTTP method to forward to the target website. |
required |
target_url
|
str
|
The destination website URL. |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function called to determine whether or not to raise
a |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward to the target. |
None
|
body
|
Optional[Union[Dict[str, Any], str, bytes]]
|
The payload to send to the target website. |
None
|
payload_type
|
PayloadType
|
Dictates how the client encodes
the |
'json'
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API
configuration parameters (e.g., |
{}
|
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Async wrapper for executing a GET request through Proxy Mode.
Inherits the smart routing logic, parameter validation, and execution
constraints of the base
request
method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The target website URL. |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward. |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API configuration parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated. |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Async wrapper for executing a POST request through Proxy Mode.
Inherits the smart routing logic, parameter validation, and execution
constraints of the base
request
method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The target website URL. |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
body
|
Optional[Union[Dict[str, Any], str, bytes]]
|
The payload to send to the target website. |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward. |
None
|
payload_type
|
PayloadType
|
Dictates how the client encodes the
|
'json'
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API configuration parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated. |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |