Async Client
async_client
¶
Asynchronous HTTP client for the Scrape.do API.
Defines AsyncScrapeDoClient, the asyncio-native version of
ScrapeDoClient. Mirrors the sync client's surface — smart routing,
retry strategy, session validation, and event hooks — via await-based
methods backed by httpx.AsyncClient.
Hooks and session validators on this client are async-only. Their
type aliases (
AsyncClientEventHooks and
AsyncSessionValidator)
type the callable as returning Awaitable[None] / Awaitable[bool] so
hooks can perform I/O while the request executes.
AsyncScrapeDoClient
¶
Asynchronous HTTP client for executing Scrape.do API requests.
asyncio-native version of ScrapeDoClient, backed by httpx.AsyncClient.
Mirrors the sync client's surface — smart routing, retry strategy,
session validation, and event hooks — but every IO-bound method is
async/await.
Features
-
Local API parameter validation via the
RequestParametersPydantic model. -
Status code error parsing and customisable retry intervals for rate-limited requests.
-
Strongly-typed interface for responses via the
ScrapeDoResponsePydantic model.
Concurrency Limit and Server Errors
This client intercepts and manages Scrape.do's specific gateway errors
(429, 502, 510), automatically applying a customisable retry strategy
before the error can reach the application. The sleep between retries
is non-blocking — await asyncio.sleep(...) rather than the sync
client's time.sleep(...).
SDK Event Hooks (event_hooks)
This client implements SDK-specific async event hooks. See
AsyncClientEventHooks
for available lifecycle hooks and their required signatures. Hooks
must be async-callable (returning Awaitable[None]).
Additional httpx.AsyncClient Configuration
The following httpx.AsyncClient parameters can be provided as
keyword arguments and will be passed directly to the underlying
object.
verifycerthttp1http2timeoutlimitstransportdefault_encoding
Additionally, the following httpx.AsyncClient.request parameters
can be provided as keyword arguments during request execution.
timeout(r_timeout)extensions
For more information on their behaviour and default values, please
consult the official
httpx documentation.
Unsupported HTTPX Client Arguments
The underlying httpx.AsyncClient object is strictly managed by the
instance to prevent invalid configurations from being sent to the
Scrape.do API. For this reason, arguments not listed in the previous
section are intentionally blocked and shouldn't be changed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_token
|
Optional[str]
|
The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable. |
None
|
max_retries
|
int
|
The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510). |
3
|
retry_backoff
|
Union[float, Callable[[int], float]]
|
The strategy
used to calculate the delay between retries. Can be a static
|
None
|
event_hooks
|
Optional[AsyncClientEventHooks]
|
A dictionary of SDK-native async hooks to execute during different points of the request lifecycle. |
None
|
verify
|
Union[SSLContext, str, bool]
|
Configures SSL certificate verification. Defaults to True (secure). |
True
|
cert
|
Optional[CertTypes]
|
Client-side certificates for mutual TLS authentication. |
None
|
http1
|
bool
|
Enable HTTP/1.1 support. |
True
|
http2
|
bool
|
Enable HTTP/2 multiplexing for higher concurrency. |
False
|
timeout
|
TimeoutTypes
|
The default timeout (in seconds) applied to all network phases. Defaults to 60s, raised from httpx's 5s default to accommodate Scrape.do proxy round-trips (browser rendering, geo-routing, fingerprinting). |
60.0
|
limits
|
Limits
|
Configuration for maximum connection pool sizes. |
DEFAULT_LIMITS
|
transport
|
Optional[AsyncBaseTransport]
|
A completely custom async transport engine. |
None
|
default_encoding
|
Union[str, Callable[[bytes], str]]
|
The fallback text encoding used if a target website omits a charset header. |
'utf-8'
|
aclose()
async
¶
Closes the underlying HTTPX async connection pool.
It is recommended to use the client as an async context manager to ensure resources are released automatically.
__aenter__()
async
¶
Async context manager entry.
Returns:
| Type | Description |
|---|---|
Self
|
instance with an opened HTTPX async connection pool. |
__aexit__(exc_type, exc_val, exc_tb)
async
¶
Calls aclose to close the underlying HTTPX async connection
pool without swallowing any exceptions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc_type
|
Optional[type[BaseException]]
|
The type of the exception. |
required |
exc_val
|
Optional[BaseException]
|
The instance of the exception. |
required |
exc_tb
|
Optional[TracebackType]
|
The traceback information. |
required |
Returns:
| Type | Description |
|---|---|
Literal[False]
|
|
execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Executes a fully prepared and validated Scrape.do request asynchronously.
Async counterpart of
ScrapeDoClient.execute.
Acts as the core execution funnel, applying the retry backoff logic,
evaluating gateway errors and sessions, and isolating cookies between
sequential executions. Sleeps between retries are non-blocking
(await asyncio.sleep(...)).
Intended Usage
Use this method if you have manually constructed a
PreparedScrapeDoRequest object for bulk routing, custom
configurations, or task reusability.
Sessions (sessionId)
If you configure a request with a session_id, Scrape.do will
attempt to route your traffic through the same proxy address.
However, it can still silently rotate this address for various
reasons. If it rotates during a multi-step scraping task, any
target-specific WAF state or cookies accumulated will be lost,
which may cause the task to fail.
Validating Sessions (session_validator)
-
In order to prevent unexpected errors due to dropped sessions, you can pass a custom async function to the client's
executemethodsession_validatorargument. -
This function will be
await-ed internally by the client after each stateful request (sessionId is not None) to determine whether or not aRotatedSessionErrorexception should be raised to signal that this session is no longer valid. -
The function should take the current request's
ScrapeDoResponseobject as its only argument and returnAwaitable[bool]. -
If the awaited value is
True, this method will raise theRotatedSessionErrorinstead of returning the response object. Otherwise, no additional action is taken.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
PreparedScrapeDoRequest
|
The validated request payload. |
required |
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions for this specific request. |
None
|
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
Raises:
| Type | Description |
|---|---|
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Executes an async request using a raw, pre-configured
api.scrape.do URL.
Async counterpart of
ScrapeDoClient.execute_from_url.
Intended Usage
This method is designed for scenarios where you have generated a
Scrape.do URL elsewhere and simply need to execute it. It parses
the URL to extract and validate the parameters, and then passes the
PreparedScrapeDoRequest to the execute method.
URL Format
The api.scrape.do URL can be either url-encoded or not. Both
will have their parameters extracted and be properly re-encoded
before the request is sent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
HttpMethod
|
The HTTP method to forward to the target website. |
required |
full_url
|
str
|
The complete, pre-formatted |
required |
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward to the target. |
None
|
body
|
Optional[Union[Dict[str, Any], str, bytes]]
|
The payload to send to the target website. |
None
|
payload_type
|
PayloadType
|
Dictates how the client encodes the
|
'json'
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
Raises:
| Type | Description |
|---|---|
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Async interface for building and executing a Scrape.do request.
Async counterpart of
ScrapeDoClient.request.
Depending on the parameter configuration it either constructs a
PreparedScrapeDoRequest object and passes it to the execute
method, or calls the execute_from_url method on the target_url.
Parameter Configuration
This method provides smart routing based on the arguments provided. You can configure the request in three distinct ways:
-
Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as
**api_kwargs(render=True,geoCode="us"). -
Pre-built Parameters : Pass a fully validated
RequestParametersobject via theparamsargument. -
Raw Scrape.do URL : Pass a full
api.scrape.doURL as thetarget_url.
Parameter Restrictions
To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.
-
When using the default Keyword Arguments (
**api_kwargs) configuration, passing a value to theparamsargument, or aapi.scrape.doURL to thetarget_urlargument will raise aValueError -
When using the Pre-built Parameters (
params) configuration, passing any**api_kwargsargument, or anapi.scrape.doURL to thetarget_urlargument, will raise aValueError -
When using the Raw Scrape.do URL configuration, passing any
**api_kwargsargument, or a value to theparamsargument, will raise aValueError
Pre-built Parameters Configuration
When passing an already constructed RequestParameters instance
to the params argument, its url attribute will be ignored and
replaced by the provided target_url.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
HttpMethod
|
The HTTP method to forward to the target website. |
required |
target_url
|
str
|
The destination website URL (or a raw Scrape.do endpoint). |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward to the target. |
None
|
body
|
Optional[Union[Dict[str, Any], str, bytes]]
|
The payload to send to the target website. |
None
|
payload_type
|
PayloadType
|
Dictates how the client encodes the
|
'json'
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API
configuration parameters (e.g., |
{}
|
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated. |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Async wrapper for executing a GET request.
Inherits the smart routing logic, parameter validation, and
execution constraints of the base
request method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The target website URL (or raw Scrape.do URL). |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward. |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API configuration parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated. |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)
async
¶
Async wrapper for executing a POST request.
Inherits the smart routing logic, parameter validation, and
execution constraints of the base
request method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
url
|
str
|
The target website URL (or raw Scrape.do URL). |
required |
params
|
Optional[RequestParameters]
|
A pre-validated parameter object. |
None
|
session_validator
|
Optional[AsyncSessionValidator]
|
A custom
async function to be called in order to determine whether or
not to raise a |
None
|
body
|
Optional[Union[Dict[str, Any], str, bytes]]
|
The payload to send to the target website. |
None
|
headers
|
Optional[Dict[str, str]]
|
Custom HTTP headers to forward. |
None
|
payload_type
|
PayloadType
|
Dictates how the client encodes the
|
'json'
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
Request-specific timeout override. |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions. |
None
|
**api_kwargs
|
Unpack[RequestParametersDict]
|
Scrape.do API configuration parameters. |
{}
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If configuration constraints are violated. |
APIConnectionError
|
If the underlying network transport drops entirely (e.g., DNS failure). |
RotatedSessionError
|
If a |
Returns:
| Type | Description |
|---|---|
ScrapeDoResponse
|
The |
AsyncClientEventHooks
¶
Bases: TypedDict
Configuration dictionary for async-native lifecycle hooks.
The async counterpart of SyncClientEventHooks. Each hook must be an
async-callable returning Awaitable[None] so it can perform I/O
(logging to an async sink, posting telemetry, awaiting locks) while
the request executes.
request
instance-attribute
¶
Fires exactly once per logical execution, immediately before the retry
loop begins. Receives the PreparedScrapeDoRequest object that will be
used to execute the request. Useful for logging the request being
executed.
response
instance-attribute
¶
Fires exactly once per logical execution, immediately after the proxy
returns a response and the session_validator (if any) passes.
Receives the request's ScrapeDoResponse object. Useful for logging
only the final response after all retries, which can be either a
successful response, a non-retryable error, or a final retryable error
after max_attempts has been exhausted.
retry
instance-attribute
¶
retry: List[
Callable[
[
int,
PreparedScrapeDoRequest,
Optional[ScrapeDoResponse],
Optional[Exception],
],
Awaitable[None],
]
]
Fires inside the execution loop ONLY when a proxy gateway error
(or an httpx.RequestError) occurs and the SDK decides to retry.
Receives the current attempt number, the prepared request, and either
the failed response (if it exists) or the httpx.RequestError that
caused the retry. Useful for tracking proxy instability or manually
raising an exception to abort the retry loop.
AsyncSessionValidator
module-attribute
¶
Defines the expected signature of the custom async function meant to be
passed to the AsyncScrapeDoClient.execute method's session_validator
argument.
Mirrors SyncSessionValidator but the callable must return
Awaitable[bool] so the validator can perform I/O (e.g., a follow-up
request to confirm session liveness) before deciding whether to raise
RotatedSessionError.