Client

`client` ¶

Synchronous HTTP client for the Scrape.do API.

Defines the primary ScrapeDoClient used for executing proxy requests. Handles autonomic error routing, customizable retry strategies, telemetry tracking, and secure, isolated connection pooling.

`ScrapeDoClient` ¶

Synchronous HTTP client for executing Scrape.do API requests.

Aims to facilitate interactions with the Scrape.do API by managing an httpx.Client instance to provide strict type-checking for request parameters, custom error parsing, and session tracking while keeping the network configurations as flexible as possible.

Features

Local API parameter validation via the RequestParameters Pydantic model.
Status code error parsing and customisable retry intervals for rate-limited requests.
Strongly-typed interface for responses via the ScrapeDoResponse Pyadantic model.

Concurrency Limit and Server Errors

This client intercepts and manages Scrape.do's specific gateway errors (429, 502, 510), automatically applying a customisable retry strategy before the error can reach the application.

SDK Event Hooks (event_hooks)

This client implements SDK-specific event hooks mimicking the structure of httpx native event hooks. See SyncClientEventHooks for available lifecycle hooks and their required signatures.

Additional httpx.Client Configuration

The following httpx.Client parameters can be provided as keyword arguments and will be passed directly to the underlying object.

verify
cert
http1
http2
timeout
limits
transport
default_encoding

Additionally, the following httpx.Client.request parameters can be provided as keyword arguments during request execution.

timeout (r_timeout)
extensions

For more information on their behaviour and default values, please consult the official httpx documentation.

Unsupported HTTPX Client Arguments

The underlying httpx.Client object is strictly managed by the instance to prevent invalid configurations from being sent to the Scrape.do API. For this reason, arguments not listed in the previous section are intentionally blocked and shouldn't be changed.

Parameters:

Name	Type	Description	Default
`api_token`	`Optional[str]`	The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable.	`None`
`max_retries`	`int`	The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510).	`3`
`retry_backoff`	`Union[float, Callable[[int], float]]`	The strategy used to calculate the delay between retries. Can be a static `float` (seconds) or a callable that accepts the current attempt number (0-indexed) and returns a float. Defaults to a jittered exponential backoff when set to `None`.	`None`
`event_hooks`	`Optional[SyncClientEventHooks]`	A dictionary of SDK-native hooks to execute during different points of the request lifecycle.	`None`
`verify`	`Union[SSLContext, str, bool]`	Configures SSL certificate verification. Defaults to True (secure).	`True`
`cert`	`Optional[CertTypes]`	Client-side certificates for mutual TLS authentication.	`None`
`http1`	`bool`	Enable HTTP/1.1 support.	`True`
`http2`	`bool`	Enable HTTP/2 multiplexing for higher concurrency.	`False`
`timeout`	`TimeoutTypes`	The default timeout (in seconds) applied to all network phases. Defaults to 60s, raised from httpx's 5s default to accommodate Scrape.do proxy round-trips (browser rendering, geo-routing, fingerprinting).	`60.0`
`limits`	`Limits`	Configuration for maximum connection pool sizes.	`DEFAULT_LIMITS`
`transport`	`Optional[BaseTransport]`	A completely custom transport engine	`None`
`default_encoding`	`Union[str, Callable[[bytes], str]]`	The fallback text encoding used if a target website omits a charset header.	`'utf-8'`

`close()` ¶

Closes the underlying HTTPX connection pool.

It is recommended to use the client as a context manager to ensure resources are released automatically.

`enter()` ¶

Initializes the HTTPX connection pool and returns the context manager object.

Returns:

Type	Description
`Self`	The `ScrapeDoClient` instance with an opened HTTPX connection pool

`exit(exc_type, exc_val, exc_tb)` ¶

Calls the close method to close the underlying HTTPX connection pool without swallowing any exceptions.

Parameters:

Name	Type	Description	Default
`exc_type`	`Optional[type[BaseException]]`	The type of the exception.	required
`exc_val`	`Optional[BaseException]`	The instance of the exception.	required
`exc_tb`	`Optional[TracebackType]`	The traceback information.	required

Returns:

Type	Description
`Literal[False]`	`False`, since no exceptions are swallowed

`execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)` ¶

Executes a fully prepared and validated Scrape.do request.

Acts as the core execution funnel, applying the retry backoff logic, evaluating gateway errors and sessions, and isolating cookies between sequential executions.

Intended Usage

Use this method if you have manually constructed a PreparedScrapeDoRequest object for bulk routing, custom configurations, or task reusability.

Sessions (sessionId)

If you configure a request with a session_id, Scrape.do will attempt to route your traffic through the same proxy address. However, it can still silently rotate this address for various reasons. If it rotates during a multi-step scraping task, any target-specific WAF state or cookies accumulated will be lost, which may cause the task to fail.

Validating Sessions (session_validator)

In order to prevent unexpected errors due to dropped sessions, you can pass a custom function to the client's execute method session_validator argument.
This function will be called internally by the client after each stateful request (sessionId is not None) to determine whether or not a RotatedSessionError exception should be raised to signal that this session is no longer valid.
The function should take the current request's ScrapeDoResponse object as its only argument, and return a single bool value.
If the function evaluates to True, this method will raise the RotatedSessionError instead of returning the response object. (The request's ScrapeDoResponse object can still be accessed later on using the exception's response attribute.) Otherwise, no additional action is taken.

Parameters:

Name	Type	Description	Default
`request`	`PreparedScrapeDoRequest`	The validated request payload.	required
`r_timeout`	`Union[TimeoutTypes, UseClientDefault]`	A request-specific timeout override.	`USE_CLIENT_DEFAULT`
`session_validator`	`Optional[SyncSessionValidator]`	A custom function to be called in order to determine whether or not to raise a `RotatedSessionError` exception.	`None`
`extensions`	`Optional[RequestExtensions]`	Advanced HTTPX extensions for this specific request.	`None`

Returns:

Type	Description
`ScrapeDoResponse`	The `ScrapeDoResponse` object containing the target's data.

Raises:

Type	Description
`APIConnectionError`	If the underlying network transport drops entirely (e.g., DNS failure).
`RotatedSessionError`	If a `session_validator` is provided, the request was made with a `session_id` argument, and the `session_validator` returned `True`

`execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)` ¶

Executes a request using a raw, pre-configured api.scrape.do URL.

Intended Usage

This method is designed for scenarios where you have generated a Scrape.do URL elsewhere and simply need to execute it. It parses the URL to extract and validate the parameters, and then passes the PreparedScrapeDoRequest to the execute method.

URL Format

The api.scrape.do URL can be either url-encoded or not. Both will have their parameters extracted and be properly re-encoded before the request is sent.

Parameters:

Name	Type	Description	Default
`method`	`HttpMethod`	The HTTP method to forward to the target website.	required
`full_url`	`str`	The complete, pre-formatted `api.scrape.do` endpoint.	required
`headers`	`Optional[Dict[str, str]]`	Custom HTTP headers to forward to the target.	`None`
`body`	`Optional[Union[Dict[str, Any], str, bytes]]`	The payload to send to the target website.	`None`
`payload_type`	`PayloadType`	Dictates how the client encodes the `body` (e.g., 'json', 'data').	`'json'`
`session_validator`	`Optional[SyncSessionValidator]`	A custom function to be called in order to determine whether or not to raise a `RotatedSessionError` exception. (See `ScrapeDoClient.execute` docstring for more information)	`None`
`r_timeout`	`Union[TimeoutTypes, UseClientDefault]`	A request-specific timeout override.	`USE_CLIENT_DEFAULT`
`extensions`	`Optional[RequestExtensions]`	Advanced HTTPX extensions.	`None`

Raises:

Type	Description
`APIConnectionError`	If the underlying network transport drops entirely (e.g., DNS failure).
`RotatedSessionError`	If a `session_validator` is provided, the request was made with a `session_id` argument, and the `session_validator` returned `True`

Returns:

Type	Description
`ScrapeDoResponse`	The `ScrapeDoResponse` object containing the target's data.

`request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

Interface for building and executing a Scrape.do request.

Depending on the parameter configuration it either constructs a PreparedScrapeDoRequest object and passes it to the execute method, or calls the execute_from_url method on the target_url.

Parameter Configuration

This method provides smart routing based on the arguments provided. You can configure the request in three distinct ways:

Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as **api_kwargs (render=True, geoCode="us").
Pre-built Parameters : Pass a fully validated RequestParameters object via the params argument.
Raw Scrape.do URL : Pass a full api.scrape.do URL as the target_url.

Parameter Restrictions

To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.

When using the default Keyword Arguments (**api_kwargs) configuration, passing a value to the params argument, or a api.scrape.do URL to the target_url argument will raise a ValueError
When using the Pre-built Parameters (params) configuration, passing any **api_kwargs argument, or an api.scrape.do URL to the target_url argument, will raise a ValueError
When using the Raw Scrape.do URL configuration, passing any **api_kwargs argument, or a value to the params argument, will raise a ValueError

Pre-built Parameters Configuration

When passing an already constructed RequestParameters instance to the params argument, its url attribute will be ignored and replaced by the provided target_url.

Parameters:

Name	Type	Description	Default
`method`	`HttpMethod`	The HTTP method to forward to the target website.	required
`target_url`	`str`	The destination website URL (or a raw Scrape.do endpoint).	required
`params`	`Optional[RequestParameters]`	A pre-validated parameter object.	`None`
`session_validator`	`Optional[SyncSessionValidator]`	A custom function to be called in order to determine whether or not to raise a `RotatedSessionError` exception. (See `ScrapeDoClient.execute` docstring for more information)	`None`
`headers`	`Optional[Dict[str, str]]`	Custom HTTP headers to forward to the target.	`None`
`body`	`Optional[Union[Dict[str, Any], str, bytes]]`	The payload to send to the target website.	`None`
`payload_type`	`PayloadType`	Dictates how the client encodes the `body`.	`'json'`
`r_timeout`	`Union[TimeoutTypes, UseClientDefault]`	Request-specific timeout override.	`USE_CLIENT_DEFAULT`
`extensions`	`Optional[RequestExtensions]`	Advanced HTTPX extensions.	`None`
`**api_kwargs`	`Unpack[RequestParametersDict]`	Scrape.do API configuration parameters (e.g., `render=True`).	`{}`

Returns:

Type	Description
`ScrapeDoResponse`	The `ScrapeDoResponse` object containing the target's data.

Raises:

Type	Description
`ValueError`	If configuration constraints are violated.
`APIConnectionError`	If the underlying network transport drops entirely (e.g., DNS failure).
`RotatedSessionError`	If a `session_validator` is provided, the request was made with a `session_id` argument, and the `session_validator` returned `True`

`get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

Wrapper for executing a GET request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name	Type	Description	Default
`url`	`str`	The target website URL (or raw Scrape.do URL).	required
`params`	`Optional[RequestParameters]`	A pre-validated parameter object.	`None`
`session_validator`	`Optional[SyncSessionValidator]`	A custom function to be called in order to determine whether or not to raise a `RotatedSessionError` exception. (See `ScrapeDoClient.execute` docstring for more information)	`None`
`headers`	`Optional[Dict[str, str]]`	Custom HTTP headers to forward.	`None`
`r_timeout`	`Union[TimeoutTypes, UseClientDefault]`	Request-specific timeout override.	`USE_CLIENT_DEFAULT`
`extensions`	`Optional[RequestExtensions]`	Advanced HTTPX extensions.	`None`
`**api_kwargs`	`Unpack[RequestParametersDict]`	Scrape.do API configuration parameters.	`{}`

Raises:

Type	Description
`ValueError`	If configuration constraints are violated.
`APIConnectionError`	If the underlying network transport drops entirely (e.g., DNS failure).
`RotatedSessionError`	If a `session_validator` is provided, the request was made with a `session_id` argument, and the `session_validator` returned `True`

Returns:

Type	Description
`ScrapeDoResponse`	The `ScrapeDoResponse` object containing the target's data.

`post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

Wrapper for executing a POST request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name	Type	Description	Default
`url`	`str`	The target website URL (or raw Scrape.do URL).	required
`params`	`Optional[RequestParameters]`	A pre-validated parameter object.	`None`
`session_validator`	`Optional[SyncSessionValidator]`	A custom function to be called in order to determine whether or not to raise a `RotatedSessionError` exception. (See `ScrapeDoClient.execute` docstring for more information)	`None`
`body`	`Optional[Union[Dict[str, Any], str, bytes]]`	The payload to send to the target website.	`None`
`headers`	`Optional[Dict[str, str]]`	Custom HTTP headers to forward.	`None`
`payload_type`	`PayloadType`	Dictates how the client encodes the `body`.	`'json'`
`r_timeout`	`Union[TimeoutTypes, UseClientDefault]`	Request-specific timeout override.	`USE_CLIENT_DEFAULT`
`extensions`	`Optional[RequestExtensions]`	Advanced HTTPX extensions.	`None`
`**api_kwargs`	`Unpack[RequestParametersDict]`	Scrape.do API configuration parameters.	`{}`

Raises:

Type	Description
`ValueError`	If configuration constraints are violated.
`APIConnectionError`	If the underlying network transport drops entirely (e.g., DNS failure).
`RotatedSessionError`	If a `session_validator` is provided, the request was made with a `session_id` argument, and the `session_validator` returned `True`

Returns:

Type	Description
`ScrapeDoResponse`	The `ScrapeDoResponse` object containing the target's data.

`default_backoff_strategy(attempt)` ¶

Calculates a jittered exponential backoff for rate-limit retries.

This is the default function used by the ScrapeDoClient to determine how long to wait before retrying a rate-limited request when the retry_backoff parameter is set to None.

Parameters:

Name	Type	Description	Default
`attempt`	`int`	The number of retries made so far, starting from 0	required

Additional Information

The jitter here is a random number between 0.1 and 1 generated by the random.uniform function.

Returns:

Type	Description
`float`	The number of seconds to sleep, calculated as (2^attempt) + jitter.

SyncClientEventHooks ¶

Bases: TypedDict

Configuration dictionary for SDK-native lifecycle hooks.

Unlike native HTTPX event hooks which fire on every transport-level execution (and can corrupt telemetry during autonomic retries), these SDK hooks map cleanly to the logical request lifecycle.

request `instance-attribute` ¶

request: List[Callable[[PreparedScrapeDoRequest], None]]

Fires exactly once per logical execution, immediately before the retry loop begins. Receives the PreparedScrapeDoRequest object that will be used to exececute the request. Useful for logging the request being executed.

response `instance-attribute` ¶

response: List[Callable[[ScrapeDoResponse], None]]

Fires exactly once per logical execution, immediately after the proxy returns a response and the session_validator (if any) passes. Receives the request's ScrapeDoResponse object. Useful for logging only the final response after all retries, which can be either a successful response, a non-retryable error, or a final retryable error after max_attempts has been exhausted.

retry `instance-attribute` ¶

retry: List[
    Callable[
        [
            int,
            PreparedScrapeDoRequest,
            Optional[ScrapeDoResponse],
            Optional[Exception],
        ],
        None,
    ]
]

Fires inside the execution loop ONLY when a proxy gateway error (or an httpx.RequestError) occurs and the SDK decides to retry. Receives the current attempt number, the prepared request, and either the failed response (if it exists) or the httpx.RequestError that caused the retry. Useful for tracking proxy instability or manually raising an exception to abort the retry loop.

SyncSessionValidator `module-attribute` ¶

SyncSessionValidator = Callable[[ScrapeDoResponse], bool]

Defines the expected signature of the custom function meant to be passed to the ScrapeDoClient.execute method's session_validator argument

Client

client ¶

ScrapeDoClient ¶

close() ¶

__enter__() ¶

__exit__(exc_type, exc_val, exc_tb) ¶

execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) ¶

execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) ¶

request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) ¶

get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) ¶

post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs) ¶

default_backoff_strategy(attempt) ¶

SyncClientEventHooks ¶

request instance-attribute ¶

response instance-attribute ¶

retry instance-attribute ¶

SyncSessionValidator module-attribute ¶

`client` ¶

`ScrapeDoClient` ¶

`close()` ¶

`enter()` ¶

`exit(exc_type, exc_val, exc_tb)` ¶

`execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)` ¶

`execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)` ¶

`request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

`get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

`post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)` ¶

`default_backoff_strategy(attempt)` ¶

request `instance-attribute` ¶

response `instance-attribute` ¶

retry `instance-attribute` ¶

SyncSessionValidator `module-attribute` ¶