Polling
polling
¶
Polling helpers for the Scrape.do Async API
Defines a configurable, default
PollingStrategy used by
wait_for_job and submit_and_wait. In addition, defines the
PollingFunction type
alias for callers who need full control over the polling cadence
PollingStrategy
¶
Bases: BaseModel
Configurable backoff strategy for wait_for_job and
submit_and_wait
Uniform Call Shape
-
Both
PollingStrategy.__call__andPollingFunctionshare the same signature -
The client polling loop calls the strategy uniformly and expects either a sleep duration in seconds or a raised
JobTimeoutError
Termination
If both max_attempts and max_wait are set to None,
the client will only stop polling when the job reaches a terminal
status
Defaults
initial_interval=1.0smax_interval=30.0smultiplier=2.0jitter=Truemax_wait=600.0s(10minutes)max_attempts=None
Backoff Math
from scrape_do.async_api import PollingStrategy
strategy = PollingStrategy(
initial_interval=1.0, # sleep 1s before the first attempt
max_interval=30.0, # never sleep more than 30s per attempt
multiplier=2.0, # double the sleep after every attempt
jitter=False, # disable jitter so the math is exact
max_wait=600.0, # give up after 10 minutes total
max_attempts=None # don't cap by number of attempts
)
# Resulting Sleep Times
# attempt=0 -> min(1.0 * 2.0**0, 30.0) = 1.0s
# attempt=1 -> min(1.0 * 2.0**1, 30.0) = 2.0s
# attempt=2 -> min(1.0 * 2.0**2, 30.0) = 4.0s
# attempt=3 -> min(1.0 * 2.0**3, 30.0) = 8.0s
# attempt=4 -> min(1.0 * 2.0**4, 30.0) = 16.0s
# attempt=5 -> min(1.0 * 2.0**5, 30.0) = 30.0s (capped)
# attempt=6 -> 30.0s (capped from here on)
# With `jitter=True` (default) each value above is then
# multiplied by a fresh `random.uniform(0.5, 1.5)` per attempt so
# multiple clients polling the same job don't synchronize their
# `get_job` requests
# `max_wait` is wall-clock, not summed sleep. A 600s value with
# the schedule above tolerates ~36 attempts before raising
# `JobTimeoutError`
Attributes:
| Name | Type | Description |
|---|---|---|
initial_interval |
float
|
Seconds to sleep before the first
re-check after a non-terminal status. Must be |
max_interval |
float
|
Cap on the per-attempt sleep duration in
seconds. Must be |
multiplier |
float
|
Exponential growth factor between
consecutive non-terminal statuses. Must be |
jitter |
bool
|
When |
max_wait |
Optional[float]
|
Maximum amount of time to spend polling
before raising. |
max_attempts |
Optional[int]
|
Maximum amount of |
next_interval(attempt)
¶
Returns the sleep duration in seconds for the given attempt
Backoff Math
-
Each attempt sleeps
min(initial_interval * (multiplier ** attempt), max_interval) -
The resulting value is multiplied by
random.uniform(0.5, 1.5)whenself.jitter=True
Overflow Safety
-
When
multiplier > 1,multiplier ** attemptcan overflowfloatifattemptis large enough -
This method catches the
OverflowErrorand caps the result tomax_intervalso that polling loops withmax_wait=Noneandmax_attempts=Nonecan keep working until the job reaches a terminal status
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attempt
|
int
|
Zero-indexed attempt counter |
required |
Returns:
| Type | Description |
|---|---|
float
|
The sleep duration in seconds capped at |
__call__(attempt, elapsed, job)
¶
Returns the next sleep duration or raises JobTimeoutError
Polling stops when the job reaches a terminal status or when
self.max_wait / self.max_attempts is exhausted
Termination
-
When
max_attemptsis set andattempt + 1 >= max_attempts, raisesJobTimeoutError -
When
max_waitis set and the next sleep would push cumulativeelapsedtime pastmax_wait, raisesJobTimeoutError -
Otherwise returns
self.next_interval(attempt)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
attempt
|
int
|
Zero-indexed attempt counter |
required |
elapsed
|
float
|
Cumulative seconds spent polling so far |
required |
job
|
JobDetails
|
The latest |
required |
Returns:
| Type | Description |
|---|---|
float
|
The sleep duration in seconds for the next polling cycle |
Raises:
| Type | Description |
|---|---|
JobTimeoutError
|
When the configured |
PollingFunction
module-attribute
¶
Defines the signature of the custom polling backoff function that can be passed
to the client's wait_for_job and submit_and_wait methods
Arguments
-
attempt:int→zero-indexedattempt counter. Represents the number of non-terminalget_jobcalls completed so far -
elapsed:float→ Cumulative wall-clock seconds spent polling since the loop started -
job:JobDetails→ The latest snapshot returned byget_job
Return Value
The function should only return the number of seconds to sleep before
the next get_job call
The job Argument
The function accepts the JobDetails response for
two reasons
-
It's useful for making the next sleep duration depend on the current job
status, or other information returned by the server. For example, you might want a shorter delay when thestatus=rotatingcompared to whenstatus=queued -
It allows you to raise the detailed
JobTimeoutErrorwhen breaking the polling loop, or to pass the job's information to your own custom exceptions
Termination
-
The
Default PollingFunctionused by the client uses the optionalmax_waitandmax_attemptsvalues to break the polling loop BEFORE the job reaches a terminal status to prevent the loop from executing indefinitely in case the job hangs -
To mimic this behaviour when providing a custom
PollingFunction, it should raise an exception, like theJobTimeoutErrorbased on some sort of condition -
This exception will be surfaced unchanged on a call to the
wait_for_joborsubmit_and_waitmethods of the client -
If the custom
PollingFunctionnever raises, the polling loop will only be broken when the job reaches a terminal status
Example of a Custom Polling Function
from scrape_do.async_api.models.response import JobDetails
from scrape_do.async_api.exceptions import JobTimeoutError
def custom_polling_function(
attempt: int,
elapsed: float,
job: JobDetails
) -> float:
# Stop polling after 2 minutes or 10 attempts
# attempt is 0-index
if elapsed >= 120 or attempt >= 9:
raise JobTimeoutError(
job_id=job.job_id,
last_status=job.status,
elapsed=elapsed,
attempts=attempt + 1
)
# Sleep for a bit longer if job is queuing or rotating
if job.status == "queuing" or job.status == "rotating":
return (3.0 ** attempt)
else:
return (2.0 ** attempt)