Skip to content

Async Client

async_client

Asynchronous client for the Scrape.do Async API

Defines the AsyncScrapeDoAsyncAPIClient, an asynchronous wrapper over httpx.AsyncClient configured against q.scrape.do. Mirrors the synchronous ScrapeDoAsyncAPIClient surface (endpoint methods, polling helpers, error routing, retry strategy, event hooks) but every IO-bound method is async / await and sleeps between retries are non-blocking (await asyncio.sleep(...))

Endpoint Mapping

AsyncScrapeDoAsyncAPIClient

Asynchronous client for the Scrape.do Async API on q.scrape.do

asyncio-native version of ScrapeDoAsyncAPIClient, backed by httpx.AsyncClient. Mirrors the sync client's surface (endpoint methods, polling helpers, error routing, retry strategy, event hooks), but every IO-bound method is async / await and sleeps between retries are non-blocking (await asyncio.sleep(...))

Features
Concurrency Limit and Server Errors

This client intercepts and manages Scrape.do's Async API specific gateway errors (429 / 502 / 503 / 504), automatically applying a customisable retry strategy before the error can reach the application. The sleep between retries is non-blocking (await asyncio.sleep(...) rather than the sync client's time.sleep(...))

SDK Event Hooks (event_hooks)

This client implements SDK-specific async event hooks mimicking the structure of httpx native event hooks. See AsyncAPIAsyncEventHooks for available lifecycle hooks and their required signatures. Hooks must be async-callable (returning Awaitable[None])

Additional httpx.AsyncClient Configuration

The following httpx.AsyncClient parameters can be provided as keyword arguments and will be passed directly to the underlying object

  • verify
  • cert
  • http1
  • http2
  • timeout
  • limits
  • transport
  • default_encoding

Additionally, the following httpx.AsyncClient.request parameters can be provided as keyword arguments during request execution

  • timeout (r_timeout)
  • extensions

For more information on their behaviour and default values, please consult the official httpx documentation

Unsupported HTTPX Client Arguments

The underlying httpx.AsyncClient object is strictly managed by the instance to prevent invalid configurations from being sent to the Scrape.do Async API. For this reason, arguments not listed in the previous section are intentionally blocked and shouldn't be changed

Parameters:

Name Type Description Default
api_token Optional[str]

The Scrape.do API key. If omitted, falls back to the SCRAPE_DO_API_KEY environment variable

None
max_retries int

Maximum retry attempts on transient gateway errors (429 / 502 / 503 / 504)

3
retry_backoff Optional[Union[float, Callable[[int], float]]]

The strategy used to calculate the delay between retries. Can be a static float (seconds) or a callable that accepts the current attempt number (0-indexed) and returns a float. Defaults to a jittered exponential backoff when set to None

None
event_hooks Optional[AsyncAPIAsyncEventHooks]

A dictionary of SDK-native async hooks to execute during different points of the Async-API request lifecycle

None
verify Union[SSLContext, str, bool]

Configures SSL certificate verification. Defaults to True (secure)

True
cert Optional[CertTypes]

Client-side certificates for mutual TLS authentication

None
http1 bool

Enable HTTP/1.1

True
http2 bool

Enable HTTP/2 multiplexing

False
timeout TimeoutTypes

Default timeout in seconds applied to every network phase

60.0
limits Limits

Connection pool limits

DEFAULT_LIMITS
transport Optional[AsyncBaseTransport]

Custom async transport engine

None
default_encoding Union[str, Callable[[bytes], str]]

The fallback text encoding used if a target website omits a charset header

'utf-8'

aclose() async

Closes the underlying HTTPX async connection pool.

It is recommended to use the client as an async context manager to ensure resources are released automatically.

__aenter__() async

Initializes the HTTPX async connection pool and returns the context manager object

Returns:

Type Description
Self

The AsyncScrapeDoAsyncAPIClient instance with an opened HTTPX async connection pool

__aexit__(exc_type, exc_val, exc_tb) async

Calls the aclose method to close the underlying HTTPX async connection pool without swallowing any exceptions

Parameters:

Name Type Description Default
exc_type Optional[Type[BaseException]]

The type of the exception

required
exc_val Optional[BaseException]

The instance of the exception

required
exc_tb Optional[TracebackType]

The traceback information

required

Returns:

Type Description
Literal[False]

False, since no exceptions are swallowed

_sleep(attempt) async

Sleeps for the duration dictated by self.retry_backoff, without blocking the event loop

Parameters:

Name Type Description Default
attempt int

The current zero-index attempt number

required

_request(method, path, *, json_body=None, params=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Sends an HTTP request to a Scrape.do Async API endpoint with retry on transient gateway errors

Usage

This method is used internally by all of the client's endpoint-specific methods

Execution
  • Applies the customisable retry_backoff strategy on retryable statuses, sleeping non-blockingly via await asyncio.sleep(...)

  • Awaits the configured AsyncAPIAsyncEventHooks (request / response / retry)

  • Uses the _raise_for_status function to raise exceptions on network and API response errors, ensuring that the returned httpx.Response is successful

Parameters:

Name Type Description Default
method HttpMethod

HTTP method

required
path str

Endpoint path relative to API_PATH

required
json_body Optional[Any]

Optional JSON body for POST

None
params Optional[QueryParamsType]

Optional query parameters for httpx client's request method

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions for this specific request

None

Returns:

Type Description
Response

The successful httpx.Response

Raises:

Type Description
AsyncAPIResponseError

Any typed Async-API error routed by status code

APIConnectionError

If the underlying network transport fails for max_retries + 1 consecutive attempts

create_job(request=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **job_kwargs) async

Creates a new Async API job

Parameter Configuration

This method provides smart routing based on the arguments provided. You can configure the request in two ways

job_kwargs Additional Configuration

Since JobCreationRequest accepts a Nested Pydantic Model for its render attribute, job_kwargs offers two ways to configure it

Parameter Restrictions

To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.

  • When using the Pre-Built Parameters configuration, passing any job_kwargs keyword-argument will raise a ValueError

  • When using the job_kwargs configuration, passing a JobCreationRequest to the request argument will raise a ValueError

  • When using the job_kwargs configuration, providing a pre-built RenderParameters instance via job_kwargs["render"] AND any other job_kwargs keyword-argument in RENDER_PARAMETER_FIELDS at the same time raises a ValueError

Parameters:

Name Type Description Default
request Optional[JobCreationRequest]

Pre-built job creation body. Mutually exclusive with **job_kwargs

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None
**job_kwargs Unpack[JobCreationRequestDict]

Flat kwargs-based configuration

{}

Returns:

Type Description
JobCreationResponse

The parsed JobCreationResponse containing the assigned job_id and per-task task_ids

Raises:

Type Description
ValueError

If both request and **job_kwargs are provided, or if the flat render fields conflict with a pre-built render in job_kwargs

AsyncAPIBadRequestError

On HTTP 400

AsyncAPIAuthError

On HTTP 401

AsyncAPIRateLimitError

On HTTP 429 once retries are exhausted

AsyncAPIServerError

On HTTP 5xx once retries are exhausted

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

get_job(job_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Fetches the current state of an Async API job

Parameters:

Name Type Description Default
job_id str

UUID of the job to fetch

required
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None

Returns:

Type Description
JobDetails

The parsed JobDetails model

Raises:

Type Description
AsyncAPINotFoundError

If the job doesn't exist or has expired

AsyncAPIAuthError

On HTTP 401

AsyncAPIServerError

On HTTP 5xx once retries are exhausteds

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

list_jobs(query=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **query_kwargs) async

Lists Async API jobs filtered / sorted by query

Parameter Configuration

This method provides smart routing based on the arguments provided. You can configure the request in two ways

Parameter Restrictions

To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.

  • When using the Pre-Built Parameters configuration, passing any query_kwargs keyword-argument will raise a ValueError

  • When using the query_kwargs configuration, passing a JobListQueryParameters to the query argument will raise a ValueError

Parameters:

Name Type Description Default
query Optional[JobListQueryParameters]

Pre-built filter / sort / pagination shape. Mutually exclusive with **query_kwargs

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None
**query_kwargs Unpack[JobListQueryParametersDict]

Flat kwargs-based configuration

{}

Returns:

Type Description
JobsListResponse

The parsed JobsListResponse model

Raises:

Type Description
ValueError

If both query and **query_kwargs are provided

AsyncAPIServerError

On HTTP 5xx once retries are exhausted

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

get_task(job_id, task_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Fetches the full details of a single task within a job

Parameters:

Name Type Description Default
job_id str

UUID of the parent job

required
task_id str

UUID of the task to fetch

required
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None

Returns:

Type Description
TaskDetails

The parsed TaskDetails model

Raises:

Type Description
AsyncAPINotFoundError

If the job / task doesn't exist or has expired

AsyncAPIServerError

On HTTP 5xx once retries are exhausted

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

cancel_job(job_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Cancels an in-flight Async API job

Parameters:

Name Type Description Default
job_id str

UUID of the job to cancel

required
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None

Returns:

Type Description
CancelJobResponse

The parsed CancelJobResponse (same shape as JobDetails, with Canceled=True)

Raises:

Type Description
AsyncAPINotFoundError

If the job doesn't exist or has expired

AsyncAPINotAcceptableError

If the job is already in a terminal state and can no longer be canceled

AsyncAPIServerError

On HTTP 5xx once retries are exhausted

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

get_user_info(*, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Fetches the current user / account information

Parameters:

Name Type Description Default
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions

None

Returns:

Type Description
UserInformation

The parsed UserInformation model

Raises:

Type Description
AsyncAPIAuthError

On HTTP 401

AsyncAPIServerError

On HTTP 5xx once retries are exhausted

AsyncAPIUnparsableResponseError

If the SDK can't parse a successful response to q.scrape.do

wait_for_job(job_id, *, strategy=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None) async

Polls a job until it reaches a terminal status, sleeping non-blockingly between attempts

Strategy Argument
  • None (default) → Uses PollingStrategy() with its documented defaults

  • Custom PollingStrategy Instance → Uses PollingStrategy() with the instance's custom configurations

  • PollingFunction → Uses the provided callable to calculate sleep times between attempts and decide whether or not to stop polling before the job reaches a terminal status

Additional strategy Information
  • For more information on how the default polling strategy works and how to customise it, see the PollingStrategy docstring

  • For more information on how to define a custom polling function, see the PollingFunction docstring

Parameters:

Name Type Description Default
job_id str

UUID of the job to poll

required
strategy Optional[Union[PollingStrategy, PollingFunction]]

How to poll for the job

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override applied to every get_job call

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions applied to every get_job call

None

Returns:

Type Description
JobDetails

The terminal JobDetails snapshot

Raises:

Type Description
JobTimeoutError

If strategy raises

submit_and_wait(request=None, *, strategy=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **job_kwargs) async

Submits a job, polls until terminal, and fetches every task

Parameter Configuration
  • This method reuses the same smart-routing as the client's job-creation method

  • For more information, see the create_job method's docstring

Polling Configuration
  • This method passes the strategy argument unchanged to the client's polling helper

  • For more information, see the wait_for_job method's docstring

Parameters:

Name Type Description Default
request Optional[JobCreationRequest]

Pre-built job creation body. Mutually exclusive with **job_kwargs

None
strategy Optional[Union[PollingStrategy, PollingFunction]]

How to poll for the job

None
r_timeout Union[TimeoutTypes, UseClientDefault]

A request-specific timeout override applied to every underlying HTTP call (create / poll / fetch)

USE_CLIENT_DEFAULT
extensions Optional[RequestExtensions]

Advanced HTTPX extensions applied to every underlying HTTP call

None
**job_kwargs Unpack[JobCreationRequestDict]

Flat kwargs-based configuration

{}

Returns:

Type Description
JobResult

A JobResult bundling the terminal JobDetails with the fetched List[TaskDetails] (one per task, in input order)

Raises:

Type Description
ValueError

If both request and **job_kwargs are provided, or if mixed render configurations are detected

JobFailedError

If the job reaches terminal status error

JobCanceledError

If the job reaches terminal status canceled

JobTimeoutError

If strategy raises

BASE_URL class-attribute instance-attribute

BASE_URL = 'https://q.scrape.do'

Base URL for the Scrape.do Async API

API_PATH class-attribute instance-attribute

API_PATH = '/api/v1'

API path prefix appended after BASE_URL

AsyncAPIAsyncEventHooks

Bases: TypedDict

Configuration dictionary for SDK-native async Async-API lifecycle hooks

Differences From The Sync Client's Hooks
  • Modeled after AsyncAPIEventHooks, but adapted to the asynchronous Async-API request lifecycle

  • Each hook must be an async-callable returning Awaitable[None] so it can perform I/O while the request executes

poll
  • The poll event hooks are the only ones that receive a custom response model (JobDetails) instead of a raw httpx object

  • This is because all other hooks are executed for all endpoint methods and have distinct request / response structures while poll hooks are only executed for /api/v1/jobs/{jobID} requests

poll instance-attribute

poll: List[Callable[[int, 'JobDetails'], Awaitable[None]]]

Fires on each non-terminal polling iteration of wait_for_job or submit_and_wait. Receives the zero-indexed attempt counter and the latest JobDetails snapshot returned by get_job. Useful for surfacing polling progress

request instance-attribute

request: List[Callable[[Request], Awaitable[None]]]

Fires immediately before each HTTP call leaves the client. Receives the prepared httpx.Request. Useful for logging the raw Async-API call about to be sent

response instance-attribute

response: List[Callable[[Response], Awaitable[None]]]

Fires immediately after each HTTP call returns, before the status-code error routing in _raise_for_status runs. Receives the raw httpx.Response. Useful for logging every Async-API response (including ones that are about to be raised on)

retry instance-attribute

retry: List[
    Callable[
        [
            int,
            Request,
            Optional[Response],
            Optional[Exception],
        ],
        Awaitable[None],
    ]
]

Fires inside the execution loop ONLY when an Async-API gateway error (429 / 502 / 503 / 504) or an httpx.RequestError occurs and the SDK decides to retry. Receives the current attempt number, the prepared httpx.Request that was retried, and either the failed httpx.Response (when the gateway returned a retryable status) or the underlying Exception that caused the retry. Useful for tracking gateway instability