Skip to content

Client

client

Synchronous HTTP client for the Scrape.do API.

Defines the primary ScrapeDoClient used for executing proxy requests. Handles autonomic error routing, customizable retry strategies, telemetry tracking, and secure, isolated connection pooling.

ScrapeDoClient

Synchronous HTTP client for executing Scrape.do API requests.

Aims to facilitate interactions with the Scrape.do API by managing an httpx.Client instance to provide strict type-checking for request parameters, custom error parsing, and session tracking while keeping the network configurations as flexible as possible.

Features
  • Local API parameter validation via the RequestParameters Pydantic model.

  • Status code error parsing and customisable retry intervals for rate-limited requests.

  • Strongly-typed interface for responses via the ScrapeDoResponse Pyadantic model.

Concurrency Limit and Server Errors

This client intercepts and manages Scrape.do's specific gateway errors (429, 502, 510), automatically applying a customisable retry strategy before the error can reach the application.

SDK Event Hooks (event_hooks)

This client implements SDK-specific event hooks mimicking the structure of httpx native event hooks. See SyncClientEventHooks for available lifecycle hooks and their required signatures.

Additional httpx.Client Configuration

The following httpx.Client parameters can be provided as keyword arguments and will be passed directly to the underlying object.

  • verify
  • cert
  • http1
  • http2
  • timeout
  • limits
  • transport
  • default_encoding

Additionally, the following httpx.Client.request parameters can be provided as keyword arguments during request execution.

  • timeout (r_timeout)
  • extensions

For more information on their behaviour and default values, please consult the official httpx documentation.

Unsupported HTTPX Client Arguments

The underlying httpx.Client object is strictly managed by the instance to prevent invalid configurations from being sent to the Scrape.do API. For this reason, arguments not listed in the previous section are intentionally blocked and shouldn't be changed.

Parameters:

Name Type Description Default
api_token Optional[str]

The Scrape.do API key. If omitted, the client will attempt to load it from the 'SCRAPE_DO_API_KEY' environment variable.

None
max_retries int

The maximum number of retry attempts for retryable Scrape.do gateway errors (HTTP 429, 502, and 510).

3
retry_backoff Union[float, Callable[[int], float]]

The strategy used to calculate the delay between retries. Can be a static float (seconds) or a callable that accepts the current attempt number (0-indexed) and returns a float. Defaults to a jittered exponential backoff when set to None.

None
event_hooks Optional[SyncClientEventHooks]

A dictionary of SDK-native hooks to execute during different points of the request lifecycle.

None
verify Union[SSLContext, str, bool]

Configures SSL certificate verification. Defaults to True (secure).

True
cert Optional[CertTypes]

Client-side certificates for mutual TLS authentication.

None
http1 bool

Enable HTTP/1.1 support.

True
http2 bool

Enable HTTP/2 multiplexing for higher concurrency.

False
timeout TimeoutTypes

The default timeout (in seconds) applied to all network phases. Defaults to 60s, raised from httpx's 5s default to accommodate Scrape.do proxy round-trips (browser rendering, geo-routing, fingerprinting).

60.0
limits Limits

Configuration for maximum connection pool sizes.

DEFAULT_LIMITS
transport Optional[BaseTransport]

A completely custom transport engine

None
default_encoding Union[str, Callable[[bytes], str]]

The fallback text encoding used if a target website omits a charset header.

'utf-8'

close()

Closes the underlying HTTPX connection pool.

It is recommended to use the client as a context manager to ensure resources are released automatically.

__enter__()

Initializes the HTTPX connection pool and returns the context manager object.

Returns:

Type Description
Self

The ScrapeDoClient instance with an opened HTTPX connection pool

__exit__(exc_type, exc_val, exc_tb)

Calls the close method to close the underlying HTTPX connection pool without swallowing any exceptions.

Parameters:

Name Type Description Default
exc_type Optional[type[BaseException]]

The type of the exception.

required
exc_val Optional[BaseException]

The instance of the exception.

required
exc_tb Optional[TracebackType]

The traceback information.

required

Returns:

Type Description
Literal[False]

False, since no exceptions are swallowed

execute(request, session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)

Executes a fully prepared and validated Scrape.do request.

Acts as the core execution funnel, applying the retry backoff logic, evaluating gateway errors and sessions, and isolating cookies between sequential executions.

Intended Usage

Use this method if you have manually constructed a PreparedScrapeDoRequest object for bulk routing, custom configurations, or task reusability.

Sessions (sessionId)

If you configure a request with a session_id, Scrape.do will attempt to route your traffic through the same proxy address. However, it can still silently rotate this address for various reasons. If it rotates during a multi-step scraping task, any target-specific WAF state or cookies accumulated will be lost, which may cause the task to fail.

Validating Sessions (session_validator)
  • In order to prevent unexpected errors due to dropped sessions, you can pass a custom function to the client's execute method session_validator argument.

  • This function will be called internally by the client after each stateful request (sessionId is not None) to determine whether or not a RotatedSessionError exception should be raised to signal that this session is no longer valid.

  • The function should take the current request's ScrapeDoResponse object as its only argument, and return a single bool value.

  • If the function evaluates to True, this method will raise the RotatedSessionError instead of returning the response object. (The request's ScrapeDoResponse object can still be accessed later on using the exception's response attribute.) Otherwise, no additional action is taken.

Parameters:

Name Type Description Default
request PreparedScrapeDoRequest

The validated request payload.

required
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override.

USE_CLIENT_DEFAULT
session_validator Optional[SyncSessionValidator]

A custom function to be called in order to determine whether or not to raise a RotatedSessionError exception.

None
extensions Optional[RequestExtensions]

Advanced HTTPX extensions for this specific request.

None

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the session_validator returned True

execute_from_url(method, full_url, headers=None, body=None, payload_type='json', session_validator=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)

Executes a request using a raw, pre-configured api.scrape.do URL.

Intended Usage

This method is designed for scenarios where you have generated a Scrape.do URL elsewhere and simply need to execute it. It parses the URL to extract and validate the parameters, and then passes the PreparedScrapeDoRequest to the execute method.

URL Format

The api.scrape.do URL can be either url-encoded or not. Both will have their parameters extracted and be properly re-encoded before the request is sent.

Parameters:

Name Type Description Default
method HttpMethod

The HTTP method to forward to the target website.

required
full_url str

The complete, pre-formatted api.scrape.do endpoint.

required
headers Optional[Dict[str, str]]

Custom HTTP headers to forward to the target.

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
payload_type PayloadType

Dictates how the client encodes the body (e.g., 'json', 'data').

'json'
session_validator Optional[SyncSessionValidator]

A custom function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See ScrapeDoClient.execute docstring for more information)

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None

Raises:

Type Description
APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the session_validator returned True

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

request(method, target_url, params=None, session_validator=None, *, headers=None, body=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)

Interface for building and executing a Scrape.do request.

Depending on the parameter configuration it either constructs a PreparedScrapeDoRequest object and passes it to the execute method, or calls the execute_from_url method on the target_url.

Parameter Configuration

This method provides smart routing based on the arguments provided. You can configure the request in three distinct ways:

  • Keyword Arguments (Default) : Pass the target URL and Scrape.do parameters directly as **api_kwargs (render=True, geoCode="us").

  • Pre-built Parameters : Pass a fully validated RequestParameters object via the params argument.

  • Raw Scrape.do URL : Pass a full api.scrape.do URL as the target_url.

Parameter Restrictions

To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.

  • When using the default Keyword Arguments (**api_kwargs) configuration, passing a value to the params argument, or a api.scrape.do URL to the target_url argument will raise a ValueError

  • When using the Pre-built Parameters (params) configuration, passing any **api_kwargs argument, or an api.scrape.do URL to the target_url argument, will raise a ValueError

  • When using the Raw Scrape.do URL configuration, passing any **api_kwargs argument, or a value to the params argument, will raise a ValueError

Pre-built Parameters Configuration

When passing an already constructed RequestParameters instance to the params argument, its url attribute will be ignored and replaced by the provided target_url.

Parameters:

Name Type Description Default
method HttpMethod

The HTTP method to forward to the target website.

required
target_url str

The destination website URL (or a raw Scrape.do endpoint).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[SyncSessionValidator]

A custom function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See ScrapeDoClient.execute docstring for more information)

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward to the target.

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters (e.g., render=True).

{}

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the session_validator returned True

get(url, params=None, session_validator=None, *, headers=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)

Wrapper for executing a GET request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL (or raw Scrape.do URL).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[SyncSessionValidator]

A custom function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See ScrapeDoClient.execute docstring for more information)

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the session_validator returned True

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

post(url, params=None, session_validator=None, *, body=None, headers=None, payload_type='json', r_timeout=USE_CLIENT_DEFAULT, extensions=None, **api_kwargs)

Wrapper for executing a POST request.

Inherits the smart routing logic, parameter validation, and execution constraints of the base request method.

Parameters:

Name Type Description Default
url str

The target website URL (or raw Scrape.do URL).

required
params Optional[RequestParameters]

A pre-validated parameter object.

None
session_validator Optional[SyncSessionValidator]

A custom function to be called in order to determine whether or not to raise a RotatedSessionError exception. (See ScrapeDoClient.execute docstring for more information)

None
body Optional[Union[Dict[str, Any], str, bytes]]

The payload to send to the target website.

None
headers Optional[Dict[str, str]]

Custom HTTP headers to forward.

None
payload_type PayloadType

Dictates how the client encodes the body.

'json'
r_timeout Union[TimeoutTypes, UseClientDefault]

Request-specific timeout override.

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions.

None
**api_kwargs Unpack[RequestParametersDict]

Scrape.do API configuration parameters.

{}

Raises:

Type Description
ValueError

If configuration constraints are violated.

APIConnectionError

If the underlying network transport drops entirely (e.g., DNS failure).

RotatedSessionError

If a session_validator is provided, the request was made with a session_id argument, and the session_validator returned True

Returns:

Type Description
ScrapeDoResponse

The ScrapeDoResponse object containing the target's data.

default_backoff_strategy(attempt)

Calculates a jittered exponential backoff for rate-limit retries.

This is the default function used by the ScrapeDoClient to determine how long to wait before retrying a rate-limited request when the retry_backoff parameter is set to None.

Parameters:

Name Type Description Default
attempt int

The number of retries made so far, starting from 0

required
Additional Information

The jitter here is a random number between 0.1 and 1 generated by the random.uniform function.

Returns:

Type Description
float

The number of seconds to sleep, calculated as (2^attempt) + jitter.

SyncClientEventHooks

Bases: TypedDict

Configuration dictionary for SDK-native lifecycle hooks.

Unlike native HTTPX event hooks which fire on every transport-level execution (and can corrupt telemetry during autonomic retries), these SDK hooks map cleanly to the logical request lifecycle.

request instance-attribute

request: List[Callable[[PreparedScrapeDoRequest], None]]

Fires exactly once per logical execution, immediately before the retry loop begins. Receives the PreparedScrapeDoRequest object that will be used to exececute the request. Useful for logging the request being executed.

response instance-attribute

response: List[Callable[[ScrapeDoResponse], None]]

Fires exactly once per logical execution, immediately after the proxy returns a response and the session_validator (if any) passes. Receives the request's ScrapeDoResponse object. Useful for logging only the final response after all retries, which can be either a successful response, a non-retryable error, or a final retryable error after max_attempts has been exhausted.

retry instance-attribute

retry: List[
    Callable[
        [
            int,
            PreparedScrapeDoRequest,
            Optional[ScrapeDoResponse],
            Optional[Exception],
        ],
        None,
    ]
]

Fires inside the execution loop ONLY when a proxy gateway error (or an httpx.RequestError) occurs and the SDK decides to retry. Receives the current attempt number, the prepared request, and either the failed response (if it exists) or the httpx.RequestError that caused the retry. Useful for tracking proxy instability or manually raising an exception to abort the retry loop.

SyncSessionValidator module-attribute

SyncSessionValidator = Callable[[ScrapeDoResponse], bool]

Defines the expected signature of the custom function meant to be passed to the ScrapeDoClient.execute method's session_validator argument