Async Client
async_client
¶
Asynchronous client for the Scrape.do Async API
Defines the AsyncScrapeDoAsyncAPIClient, an
asynchronous wrapper over httpx.AsyncClient configured against
q.scrape.do. Mirrors the synchronous ScrapeDoAsyncAPIClient surface
(endpoint methods, polling helpers, error routing, retry strategy,
event hooks) but every IO-bound method is async / await and sleeps
between retries are non-blocking (await asyncio.sleep(...))
Endpoint Mapping
-
POST /api/v1/jobs→create_job -
GET /api/v1/jobs/{jobID}→get_job -
GET /api/v1/jobs/{jobID}/{taskID}→get_task -
GET /api/v1/jobs→list_jobs -
DELETE /api/v1/jobs/{jobID}→cancel_job -
GET /api/v1/me→get_user_info
AsyncScrapeDoAsyncAPIClient
¶
Asynchronous client for the Scrape.do Async API on q.scrape.do
asyncio-native version of ScrapeDoAsyncAPIClient, backed by
httpx.AsyncClient. Mirrors the sync client's surface
(endpoint methods, polling helpers, error routing, retry strategy,
event hooks), but every IO-bound method is async / await and
sleeps between retries are non-blocking (await asyncio.sleep(...))
Features
-
Pre-flight payload validation via the
JobCreationRequestandJobListQueryParametersmodels -
Status code error routing to specific exceptions (
400/401/404/406/429/5xx) -
Customisable retry intervals on transient gateway errors
-
Polling helpers (
wait_for_jobandsubmit_and_wait) with either a built-inPollingStrategyor a user-suppliedPollingFunction
Concurrency Limit and Server Errors
This client intercepts and manages Scrape.do's Async API specific
gateway errors (429 / 502 / 503 / 504),
automatically applying a customisable retry strategy before the error
can reach the application. The sleep between retries is non-blocking
(await asyncio.sleep(...) rather than the sync client's
time.sleep(...))
SDK Event Hooks (event_hooks)
This client implements SDK-specific async event hooks mimicking the
structure of httpx native event hooks. See
AsyncAPIAsyncEventHooks
for available lifecycle hooks and their required signatures.
Hooks must be async-callable (returning Awaitable[None])
Additional httpx.AsyncClient Configuration
The following httpx.AsyncClient parameters can be provided as
keyword arguments and will be passed directly to the underlying
object
verifycerthttp1http2timeoutlimitstransportdefault_encoding
Additionally, the following httpx.AsyncClient.request parameters
can be provided as keyword arguments during request execution
timeout(r_timeout)extensions
For more information on their behaviour and default values, please
consult the official
httpx documentation
Unsupported HTTPX Client Arguments
The underlying httpx.AsyncClient object is strictly managed by the
instance to prevent invalid configurations from being sent to the
Scrape.do Async API. For this reason, arguments not listed in the
previous section are intentionally blocked and shouldn't be changed
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_token
|
Optional[str]
|
The Scrape.do API key. If omitted,
falls back to the |
None
|
max_retries
|
int
|
Maximum retry attempts on transient gateway
errors ( |
3
|
retry_backoff
|
Optional[Union[float, Callable[[int], float]]]
|
The
strategy used to calculate the delay between retries. Can be a
static |
None
|
event_hooks
|
Optional[AsyncAPIAsyncEventHooks]
|
A dictionary of SDK-native async hooks to execute during different points of the Async-API request lifecycle |
None
|
verify
|
Union[SSLContext, str, bool]
|
Configures SSL certificate verification. Defaults to True (secure) |
True
|
cert
|
Optional[CertTypes]
|
Client-side certificates for mutual TLS authentication |
None
|
http1
|
bool
|
Enable HTTP/1.1 |
True
|
http2
|
bool
|
Enable HTTP/2 multiplexing |
False
|
timeout
|
TimeoutTypes
|
Default timeout in seconds applied to every network phase |
60.0
|
limits
|
Limits
|
Connection pool limits |
DEFAULT_LIMITS
|
transport
|
Optional[AsyncBaseTransport]
|
Custom async transport engine |
None
|
default_encoding
|
Union[str, Callable[[bytes], str]]
|
The fallback text encoding used if a target website omits a charset header |
'utf-8'
|
aclose()
async
¶
Closes the underlying HTTPX async connection pool.
It is recommended to use the client as an async context manager to ensure resources are released automatically.
__aenter__()
async
¶
Initializes the HTTPX async connection pool and returns the context manager object
Returns:
| Type | Description |
|---|---|
Self
|
The |
__aexit__(exc_type, exc_val, exc_tb)
async
¶
Calls the aclose method to close the underlying HTTPX
async connection pool without swallowing any exceptions
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
exc_type
|
Optional[Type[BaseException]]
|
The type of the exception |
required |
exc_val
|
Optional[BaseException]
|
The instance of the exception |
required |
exc_tb
|
Optional[TracebackType]
|
The traceback information |
required |
Returns:
| Type | Description |
|---|---|
Literal[False]
|
|
_sleep(attempt)
async
¶
Sleeps for the duration dictated by self.retry_backoff,
without blocking the event loop
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attempt
|
int
|
The current zero-index attempt number |
required |
_request(method, path, *, json_body=None, params=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Sends an HTTP request to a Scrape.do Async API endpoint with
retry on transient gateway errors
Usage
This method is used internally by all of the client's endpoint-specific methods
Execution
-
Applies the customisable
retry_backoffstrategy on retryable statuses, sleeping non-blockingly viaawait asyncio.sleep(...) -
Awaits the configured
AsyncAPIAsyncEventHooks(request/response/retry) -
Uses the
_raise_for_statusfunction to raise exceptions on network and API response errors, ensuring that the returnedhttpx.Responseis successful
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
HttpMethod
|
HTTP method |
required |
path
|
str
|
Endpoint path relative to |
required |
json_body
|
Optional[Any]
|
Optional JSON body for |
None
|
params
|
Optional[QueryParamsType]
|
Optional query
parameters for |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions for this specific request |
None
|
Returns:
| Type | Description |
|---|---|
Response
|
The successful |
Raises:
| Type | Description |
|---|---|
AsyncAPIResponseError
|
Any typed Async-API error routed by status code |
APIConnectionError
|
If the underlying network transport
fails for |
create_job(request=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **job_kwargs)
async
¶
Creates a new Async API job
Parameter Configuration
This method provides smart routing based on the arguments provided. You can configure the request in two ways
-
job_kwargs→ Build aJobCreationRequestimplicitly by passing the keyword-arguments accepted by theJobCreationRequestDictTypedDict -
Pre-Built Parameters→ Pass a validatedJobCreationRequestinstance directly to therequestargument
job_kwargs Additional Configuration
Since JobCreationRequest accepts
a Nested Pydantic Model for its
render attribute, job_kwargs offers two ways to configure it
-
Implicit Construction→ Passevery field accepted by the nested modelas a flat keyword-argument -
Explicit Construction→ Pass a validatedRenderParametersinstance to therenderkeyword-argument
Parameter Restrictions
To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.
-
When using the
Pre-Built Parametersconfiguration, passing anyjob_kwargskeyword-argument will raise aValueError -
When using the
job_kwargsconfiguration, passing aJobCreationRequestto therequestargument will raise aValueError -
When using the
job_kwargsconfiguration, providing a pre-builtRenderParametersinstance viajob_kwargs["render"]AND any otherjob_kwargskeyword-argument inRENDER_PARAMETER_FIELDSat the same time raises aValueError
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
Optional[JobCreationRequest]
|
Pre-built job
creation body. Mutually exclusive with |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
**job_kwargs
|
Unpack[JobCreationRequestDict]
|
Flat kwargs-based configuration |
{}
|
Returns:
| Type | Description |
|---|---|
JobCreationResponse
|
The parsed |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both |
AsyncAPIBadRequestError
|
On |
AsyncAPIAuthError
|
On |
AsyncAPIRateLimitError
|
On |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
get_job(job_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Fetches the current state of an Async API job
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
UUID of the job to fetch |
required |
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
Returns:
| Type | Description |
|---|---|
JobDetails
|
The parsed |
Raises:
| Type | Description |
|---|---|
AsyncAPINotFoundError
|
If the job doesn't exist or has expired |
AsyncAPIAuthError
|
On |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
list_jobs(query=None, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **query_kwargs)
async
¶
Lists Async API jobs filtered / sorted by query
Parameter Configuration
This method provides smart routing based on the arguments provided. You can configure the request in two ways
-
query_kwargs→ Build aJobListQueryParametersimplicitly by passing the keyword-arguments accepted by theJobListQueryParametersDictTypedDict -
Pre-Built Parameters→ Pass a validatedJobListQueryParametersinstance directly to thequeryargument
Parameter Restrictions
To prevent silent overwrites and routing ambiguity, the client enforces that only one of the parameter configurations can be used at a time.
-
When using the
Pre-Built Parametersconfiguration, passing anyquery_kwargskeyword-argument will raise aValueError -
When using the
query_kwargsconfiguration, passing aJobListQueryParametersto thequeryargument will raise aValueError
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query
|
Optional[JobListQueryParameters]
|
Pre-built
filter / sort / pagination shape. Mutually exclusive
with |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
**query_kwargs
|
Unpack[JobListQueryParametersDict]
|
Flat kwargs-based configuration |
{}
|
Returns:
| Type | Description |
|---|---|
JobsListResponse
|
The parsed |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
get_task(job_id, task_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Fetches the full details of a single task within a job
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
UUID of the parent job |
required |
task_id
|
str
|
UUID of the task to fetch |
required |
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
Returns:
| Type | Description |
|---|---|
TaskDetails
|
The parsed |
Raises:
| Type | Description |
|---|---|
AsyncAPINotFoundError
|
If the job / task doesn't exist or has expired |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
cancel_job(job_id, *, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Cancels an in-flight Async API job
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
UUID of the job to cancel |
required |
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
Returns:
| Type | Description |
|---|---|
CancelJobResponse
|
The parsed |
Raises:
| Type | Description |
|---|---|
AsyncAPINotFoundError
|
If the job doesn't exist or has expired |
AsyncAPINotAcceptableError
|
If the job is already in a terminal state and can no longer be canceled |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
get_user_info(*, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Fetches the current user / account information
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions |
None
|
Returns:
| Type | Description |
|---|---|
UserInformation
|
The parsed |
Raises:
| Type | Description |
|---|---|
AsyncAPIAuthError
|
On |
AsyncAPIServerError
|
On |
AsyncAPIUnparsableResponseError
|
If the SDK can't parse a
successful response to |
wait_for_job(job_id, *, strategy=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None)
async
¶
Polls a job until it reaches a terminal status, sleeping non-blockingly between attempts
Strategy Argument
-
None (default)→ UsesPollingStrategy()with its documented defaults -
Custom PollingStrategy Instance→ UsesPollingStrategy()with the instance's custom configurations -
PollingFunction→ Uses the provided callable to calculate sleep times between attempts and decide whether or not to stop polling before the job reaches a terminal status
Additional strategy Information
-
For more information on how the default polling strategy works and how to customise it, see the
PollingStrategydocstring -
For more information on how to define a custom polling function, see the
PollingFunctiondocstring
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
job_id
|
str
|
UUID of the job to poll |
required |
strategy
|
Optional[Union[PollingStrategy, PollingFunction]]
|
How to poll for the job |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A
request-specific timeout override applied to every |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX
extensions applied to every |
None
|
Returns:
| Type | Description |
|---|---|
JobDetails
|
The terminal |
Raises:
| Type | Description |
|---|---|
JobTimeoutError
|
If |
submit_and_wait(request=None, *, strategy=None, r_timeout=USE_CLIENT_DEFAULT, extensions=None, **job_kwargs)
async
¶
Submits a job, polls until terminal, and fetches every task
Parameter Configuration
-
This method reuses the same smart-routing as the client's job-creation method
-
For more information, see the
create_jobmethod's docstring
Polling Configuration
-
This method passes the
strategyargument unchanged to the client's polling helper -
For more information, see the
wait_for_jobmethod's docstring
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
request
|
Optional[JobCreationRequest]
|
Pre-built job
creation body. Mutually exclusive with |
None
|
strategy
|
Optional[Union[PollingStrategy, PollingFunction]]
|
How to poll for the job |
None
|
r_timeout
|
Union[TimeoutTypes, UseClientDefault]
|
A request-specific timeout override applied to every underlying HTTP call (create / poll / fetch) |
USE_CLIENT_DEFAULT
|
extensions
|
Optional[RequestExtensions]
|
Advanced HTTPX extensions applied to every underlying HTTP call |
None
|
**job_kwargs
|
Unpack[JobCreationRequestDict]
|
Flat kwargs-based configuration |
{}
|
Returns:
| Type | Description |
|---|---|
JobResult
|
A |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both |
JobFailedError
|
If the job reaches terminal status |
JobCanceledError
|
If the job reaches terminal status
|
JobTimeoutError
|
If |
BASE_URL
class-attribute
instance-attribute
¶
Base URL for the Scrape.do Async API
API_PATH
class-attribute
instance-attribute
¶
API path prefix appended after BASE_URL
AsyncAPIAsyncEventHooks
¶
Bases: TypedDict
Configuration dictionary for SDK-native async Async-API lifecycle hooks
Differences From The Sync Client's Hooks
-
Modeled after
AsyncAPIEventHooks, but adapted to the asynchronousAsync-APIrequest lifecycle -
Each hook must be an async-callable returning
Awaitable[None]so it can perform I/O while the request executes
poll
-
The
pollevent hooks are the only ones that receive a custom response model (JobDetails) instead of a rawhttpxobject -
This is because all other hooks are executed for all endpoint methods and have distinct
request/responsestructures whilepollhooks are only executed for/api/v1/jobs/{jobID}requests
poll
instance-attribute
¶
Fires on each non-terminal polling iteration of wait_for_job or
submit_and_wait. Receives the zero-indexed attempt counter and
the latest JobDetails snapshot returned
by get_job. Useful for surfacing polling progress
request
instance-attribute
¶
Fires immediately before each HTTP call leaves the client.
Receives the prepared httpx.Request. Useful for logging the raw
Async-API call about to be sent
response
instance-attribute
¶
Fires immediately after each HTTP call returns, before the
status-code error routing in _raise_for_status runs. Receives
the raw httpx.Response. Useful for logging every Async-API
response (including ones that are about to be raised on)
retry
instance-attribute
¶
retry: List[
Callable[
[
int,
Request,
Optional[Response],
Optional[Exception],
],
Awaitable[None],
]
]
Fires inside the execution loop ONLY when an Async-API gateway
error (429 / 502 / 503 / 504) or an httpx.RequestError
occurs and the SDK decides to retry. Receives the current attempt
number, the prepared httpx.Request that was retried, and either
the failed httpx.Response (when the gateway returned a retryable
status) or the underlying Exception that caused the retry. Useful
for tracking gateway instability