Skip to content

Async Client

async_client

Asynchronous HTTP client for the Scrape.do API.

Defines AsyncScrapeDoClient, the asyncio-native version of ScrapeDoClient. Mirrors the sync client's surface — smart routing, retry strategy, session validation, and event hooks — via await-based methods backed by httpx.AsyncClient.

Hooks and session validators on this client are async-only. Their type aliases ( AsyncClientEventHooks and AsyncSessionValidator) type the callable as returning Awaitable[None] / Awaitable[bool] so hooks can perform I/O while the request executes.

AsyncScrapeDoClient

Asynchronous HTTP client for executing Scrape.do API requests.

asyncio-native version of ScrapeDoClient, backed by httpx.AsyncClient. Mirrors the sync client's surface — smart routing, retry strategy, session validation, and event hooks — but every IO-bound method is async/await.

Features
  • Local API parameter validation via the RequestParameters Pydantic model.

  • Status code error parsing and customisable retry intervals for rate-limited requests.

  • Strongly-typed interface for responses via the ScrapeDoResponse Pydantic model.

Concurrency Limit and Server Errors

This client intercepts and manages Scrape.do's specific gateway errors (429, 502, 510), automatically applying a customisable retry strategy before the error can reach the application. The sleep between retries is non-blocking — await asyncio.sleep(...) rather than the sync client's time.sleep(...).

SDK Event Hooks (event_hooks)

This client implements SDK-specific async event hooks. See AsyncClientEventHooks for available lifecycle hooks and their required signatures. Hooks must be async-callable (returning Awaitable[None]).

Additional httpx.AsyncClient Configuration

The following httpx.AsyncClient parameters can be provided as keyword arguments and will be passed directly to the underlying object.

  • verify
  • cert
  • http1
  • http2
  • timeout
  • limits
  • transport
  • default_encoding

Additionally, the following httpx.AsyncClient.request parameters can be provided as keyword arguments during request execution.

  • timeout (r_timeout)
  • extensions

For more information on their behaviour and default values, please consult the official httpx documentation.

Unsupported HTTPX Client Arguments

The underlying httpx.AsyncClient object is strictly managed by the instance to prevent invalid configurations from being sent to the Scrape.do API. For this reason, arguments not listed in the previous section are intentionally blocked and shouldn't be changed.

Parameters:

Name Type Description Default
api_token Optional[str]

The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable.

None
max_retries int

The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510).

3
retry_backoff Union[float, Callable[[int], float]]

The strategy used to calculate the delay between retries. Can be a static float (seconds) or a callable that accepts the current attempt number (0-indexed) and returns a float. Defaults to a jittered exponential backoff when set to None. (Shared with the sync client)

None
event_hooks Optional[AsyncClientEventHooks]

A dictionary of SDK-native async hooks to execute during different points of the request lifecycle.

None
verify Union[SSLContext, str, bool]

Configures SSL certificate verification. Defaults to True (secure).

True
cert Optional[CertTypes]

Client-side certificates for mutual TLS authentication.

None
http1 bool

Enable HTTP/1.1 support.

True
http2 bool

Enable HTTP/2 multiplexing for higher concurrency.

False
timeout TimeoutTypes

The default timeout (in seconds) applied to all network phases. Defaults to 60s, raised from httpx's 5s default to accommodate Scrape.do proxy round-trips (browser rendering, geo-routing, fingerprinting).

60.0
limits Limits

Configuration for maximum connection pool sizes.

DEFAULT_LIMITS
transport Optional[AsyncBaseTransport]

A completely custom async transport engine.

None
default_encoding Union[str, Callable[[bytes], str]]

The fallback text encoding used if a target website omits a charset header.

'utf-8'

aclose() async

Closes the underlying HTTPX async connection pool.

It is recommended to use the client as an async context manager to ensure resources are released automatically.

__aenter__() async

Async context manager entry.

Returns:

Type Description
Self

instance with an opened HTTPX async connection pool.

__aexit__(exc_type, exc_val, exc_tb) async

Calls aclose to close the underlying HTTPX async connection pool without swallowing any exceptions.

Parameters:

Name Type Description Default
exc_type Optional[type[BaseException]]

The type of the exception.

required
exc_val Optional[BaseException]

The instance of the exception.

required
exc_tb Optional[TracebackType]

The traceback information.

required

Returns:

Type Description
Literal[False]

False, since no exceptions are swallowed.

execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Executes a fully prepared and validated Scrape.do request asynchronously.

Async counterpart of ScrapeDoClient.execute. Acts as the core execution funnel, applying the retry backoff logic, evaluating gateway errors and sessions, and isolating cookies between sequential executions. Sleeps between retries are non-blocking (await asyncio.sleep(...)).

Intended Usage

Use this method if you have manually constructed a PreparedScrapeDoRequest object for bulk routing, custom configurations, or task reusability.

Sessions (sessionId)

If you configure a request with a session_id, Scrape.do will attempt to route your traffic through the same proxy address. However, it can still silently rotate this address for various reasons. If it rotates during a multi-step scraping task, any target-specific WAF state or cookies accumulated will be lost, which may cause the task to fail.

Validating Sessions (session_validator)
  • In order to prevent unexpected errors due to dropped sessions, you can pass a custom async function to the client's execute method session_validator argument.

  • This function will be await-ed internally by the client after each stateful request (sessionId is not None) to determine whether or not a RotatedSessionError exception should be raised to signal that this session is no longer valid.

  • The function should take the current request's ScrapeDoResponse object as its only argument and return Awaitable[bool].

  • If the awaited value is True, this method will raise the RotatedSessionError instead of returning the response object. Otherwise, no additional action is taken.

Parameters:

Name Type Description Default
request PreparedScrapeDoRequest

The validated request payload.

required
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception.

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions for this specific request.

None

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Executes an async request using a raw, pre-configured api.scrape.do URL.

Async counterpart of ScrapeDoClient.execute_from_url.

Intended Usage

This method is designed for scenarios where you have generated a Scrape.do URL elsewhere and simply need to execute it. It parses the URL to extract and validate the parameters, and then passes the PreparedScrapeDoRequest to the execute method.

URL Format

The api.scrape.do URL can be either url-encoded or not. Both will have their parameters extracted and be properly re-encoded before the request is sent.

Parameters:

Name Type Description Default
method HttpMethod

The HTTP method to forward to the target website.

required
full_url str

The complete, pre-formatted api.scrape.do endpoint.

required
headers Optional[Dict[str, str]]

Custom HTTP headers to forward to the target.

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
payload_type PayloadType

Dictates how the client encodes the body (e.g., 'json', 'data').

'json'
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoClient.execute docstring for more information.)

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None

Raises:

Type Description
APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Async interface for building and executing a Scrape.do request.

Async counterpart of ScrapeDoClient.request. Depending on the parameter configuration it either constructs a PreparedScrapeDoRequest object and passes it to the execute method, or calls the execute_from_url method on the target_url.

Parameter Configuration

This method provides smart routing based on the arguments provided. You can configure the request in three distinct ways:

  • Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as **api_kwargs (render=True, geoCode="us").

  • Pre-built Parameters : Pass a fully validated RequestParameters object via the params argument.

  • Raw Scrape.do URL : Pass a full api.scrape.do URL as the target_url.

Parameter Restrictions

To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.

  • When using the default Keyword Arguments (**api_kwargs) configuration, passing a value to the params argument, or a api.scrape.do URL to the target_url argument will raise a ValueError

  • When using the Pre-built Parameters (params) configuration, passing any **api_kwargs argument, or an api.scrape.do URL to the target_url argument, will raise a ValueError

  • When using the Raw Scrape.do URL configuration, passing any **api_kwargs argument, or a value to the params argument, will raise a ValueError

Pre-built Parameters Configuration

When passing an already constructed RequestParameters instance to the params argument, its url attribute will be ignored and replaced by the provided target_url.

Parameters:

Name Type Description Default
method HttpMethod

The HTTP method to forward to the target website.

required
target_url str

The destination website URL (or a raw Scrape.do endpoint).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoClient.execute docstring for more information.)

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward to the target.

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters (e.g., render=True).

{}

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Async wrapper for executing a GET request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL (or raw Scrape.do URL).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoClient.execute docstring for more information.)

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) async

Async wrapper for executing a POST request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL (or raw Scrape.do URL).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[AsyncSessionValidator]

A custom async function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See AsyncScrapeDoClient.execute docstring for more information.)

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the awaited session_validator returned True.

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

AsyncClientEventHooks

Bases: TypedDict

Configuration dictionary for async-native lifecycle hooks.

The async counterpart of SyncClientEventHooks. Each hook must be an async-callable returning Awaitable[None] so it can perform I/O (logging to an async sink, posting telemetry, awaiting locks) while the request executes.

request instance-attribute

request: List[
    Callable[[PreparedScrapeDoRequest], Awaitable[None]]
]

Fires exactly once per logical execution, immediately before the retry loop begins. Receives the PreparedScrapeDoRequest object that will be used to execute the request. Useful for logging the request being executed.

response instance-attribute

response: List[
    Callable[[ScrapeDoResponse], Awaitable[None]]
]

Fires exactly once per logical execution, immediately after the proxy returns a response and the session_validator (if any) passes. Receives the request's ScrapeDoResponse object. Useful for logging only the final response after all retries, which can be either a successful response, a non-retryable error, or a final retryable error after max_attempts has been exhausted.

retry instance-attribute

retry: List[
    Callable[
        [
            int,
            PreparedScrapeDoRequest,
            Optional[ScrapeDoResponse],
            Optional[Exception],
        ],
        Awaitable[None],
    ]
]

Fires inside the execution loop ONLY when a proxy gateway error (or an httpx.RequestError) occurs and the SDK decides to retry. Receives the current attempt number, the prepared request, and either the failed response (if it exists) or the httpx.RequestError that caused the retry. Useful for tracking proxy instability or manually raising an exception to abort the retry loop.

AsyncSessionValidator module-attribute

AsyncSessionValidator = Callable[
    [ScrapeDoResponse], Awaitable[bool]
]

Defines the expected signature of the custom async function meant to be passed to the AsyncScrapeDoClient.execute method's session_validator argument.

Mirrors SyncSessionValidator but the callable must return Awaitable[bool] so the validator can perform I/O (e.g., a follow-up request to confirm session liveness) before deciding whether to raise RotatedSessionError.