Polling

`polling` ¶

Polling helpers for the Scrape.do Async API

Defines a configurable, default PollingStrategy used by wait_for_job and submit_and_wait. In addition, defines the PollingFunction type alias for callers who need full control over the polling cadence

`PollingStrategy` ¶

Bases: BaseModel

Configurable backoff strategy for wait_for_job and submit_and_wait

Uniform Call Shape

Both PollingStrategy.__call__ and PollingFunction share the same signature
The client polling loop calls the strategy uniformly and expects either a sleep duration in seconds or a raised JobTimeoutError

Termination

If both max_attempts and max_wait are set to None, the client will only stop polling when the job reaches a terminal status

Defaults

initial_interval=1.0s
max_interval=30.0s
multiplier=2.0
jitter=True
max_wait=600.0s (10 minutes)
max_attempts=None

Backoff Math

from scrape_do.async_api import PollingStrategy

strategy = PollingStrategy(
    initial_interval=1.0,  # sleep 1s before the first attempt
    max_interval=30.0,     # never sleep more than 30s per attempt
    multiplier=2.0,        # double the sleep after every attempt
    jitter=False,          # disable jitter so the math is exact
    max_wait=600.0,        # give up after 10 minutes total
    max_attempts=None      # don't cap by number of attempts
    )

# Resulting Sleep Times
#   attempt=0 -> min(1.0 * 2.0**0, 30.0) = 1.0s
#   attempt=1 -> min(1.0 * 2.0**1, 30.0) = 2.0s
#   attempt=2 -> min(1.0 * 2.0**2, 30.0) = 4.0s
#   attempt=3 -> min(1.0 * 2.0**3, 30.0) = 8.0s
#   attempt=4 -> min(1.0 * 2.0**4, 30.0) = 16.0s
#   attempt=5 -> min(1.0 * 2.0**5, 30.0) = 30.0s  (capped)
#   attempt=6 -> 30.0s  (capped from here on)

# With `jitter=True` (default) each value above is then
# multiplied by a fresh `random.uniform(0.5, 1.5)` per attempt so
# multiple clients polling the same job don't synchronize their
# `get_job` requests

# `max_wait` is wall-clock, not summed sleep. A 600s value with
# the schedule above tolerates ~36 attempts before raising
# `JobTimeoutError`

Attributes:

Name	Type	Description
`initial_interval`	`float`	Seconds to sleep before the first re-check after a non-terminal status. Must be `>= 0.1`
`max_interval`	`float`	Cap on the per-attempt sleep duration in seconds. Must be `>= 0.1`
`multiplier`	`float`	Exponential growth factor between consecutive non-terminal statuses. Must be `>= 1.0`
`jitter`	`bool`	When `True`, multiplies the computed interval by `random.uniform(0.5, 1.5)` to avoid synchronized polling across clients
`max_wait`	`Optional[float]`	Maximum amount of time to spend polling before raising. `None` disables the budget check
`max_attempts`	`Optional[int]`	Maximum amount of `get_job` calls before raising. `None` disables the attempt-count check

`next_interval(attempt)` ¶

Returns the sleep duration in seconds for the given attempt

Backoff Math

Each attempt sleeps min(initial_interval * (multiplier ** attempt), max_interval)
The resulting value is multiplied by random.uniform(0.5, 1.5) when self.jitter=True

Overflow Safety

When multiplier > 1, multiplier ** attempt can overflow float if attempt is large enough
This method catches the OverflowError and caps the result to max_interval so that polling loops with max_wait=None and max_attempts=None can keep working until the job reaches a terminal status

Parameters:

Name	Type	Description	Default
`attempt`	`int`	Zero-indexed attempt counter	required

Returns:

Type	Description
`float`	The sleep duration in seconds capped at `max_interval`, optionally jittered

`call(attempt, elapsed, job)` ¶

Returns the next sleep duration or raises JobTimeoutError

Polling stops when the job reaches a terminal status or when self.max_wait / self.max_attempts is exhausted

Termination

When max_attempts is set and attempt + 1 >= max_attempts, raises JobTimeoutError
When max_wait is set and the next sleep would push cumulative elapsed time past max_wait, raises JobTimeoutError
Otherwise returns self.next_interval(attempt)

Parameters:

Name	Type	Description	Default
`attempt`	`int`	Zero-indexed attempt counter	required
`elapsed`	`float`	Cumulative seconds spent polling so far	required
`job`	`JobDetails`	The latest `JobDetails` snapshot returned by `get_job`	required

Returns:

Type	Description
`float`	The sleep duration in seconds for the next polling cycle

Raises:

Type	Description
`JobTimeoutError`	When the configured `max_attempts` or `max_wait` budget is exhausted

PollingFunction `module-attribute` ¶

PollingFunction = Callable[
    [int, float, "JobDetails"], float
]

Defines the signature of the custom polling backoff function that can be passed to the client's wait_for_job and submit_and_wait methods

Arguments

attempt: int → zero-indexed attempt counter. Represents the number of non-terminal get_job calls completed so far
elapsed: float → Cumulative wall-clock seconds spent polling since the loop started
job: JobDetails → The latest snapshot returned by get_job

Return Value

The function should only return the number of seconds to sleep before the next get_job call

The job Argument

The function accepts the JobDetails response for two reasons

It's useful for making the next sleep duration depend on the current job status, or other information returned by the server. For example, you might want a shorter delay when the status=rotating compared to when status=queued
It allows you to raise the detailed JobTimeoutError when breaking the polling loop, or to pass the job's information to your own custom exceptions

Termination

The Default PollingFunction used by the client uses the optional max_wait and max_attempts values to break the polling loop BEFORE the job reaches a terminal status to prevent the loop from executing indefinitely in case the job hangs
To mimic this behaviour when providing a custom PollingFunction, it should raise an exception, like the JobTimeoutError based on some sort of condition
This exception will be surfaced unchanged on a call to the wait_for_job or submit_and_wait methods of the client
If the custom PollingFunction never raises, the polling loop will only be broken when the job reaches a terminal status

Example of a Custom Polling Function

from scrape_do.async_api.models.response import JobDetails
from scrape_do.async_api.exceptions import JobTimeoutError

def custom_polling_function(
    attempt: int,
    elapsed: float,
    job: JobDetails
) -> float:

    # Stop polling after 2 minutes or 10 attempts
    # attempt is 0-index
    if elapsed >= 120 or attempt >= 9:
        raise JobTimeoutError(
            job_id=job.job_id,
            last_status=job.status,
            elapsed=elapsed,
            attempts=attempt + 1
            )

    # Sleep for a bit longer if job is queuing or rotating
    if job.status == "queuing" or job.status == "rotating":
        return (3.0 ** attempt)
    else:
        return  (2.0 ** attempt)

Polling

polling ¶

PollingStrategy ¶

next_interval(attempt) ¶

__call__(attempt, elapsed, job) ¶

PollingFunction module-attribute ¶

`polling` ¶

`PollingStrategy` ¶

`next_interval(attempt)` ¶

`call(attempt, elapsed, job)` ¶

PollingFunction `module-attribute` ¶