Skip to content

Polling

polling

Polling helpers for the Scrape.do Async API

Defines a configurable, default PollingStrategy used by wait_for_job and submit_and_wait. In addition, defines the PollingFunction type alias for callers who need full control over the polling cadence

PollingStrategy

Bases: BaseModel

Configurable backoff strategy for wait_for_job and submit_and_wait

Uniform Call Shape
  • Both PollingStrategy.__call__ and PollingFunction share the same signature

  • The client polling loop calls the strategy uniformly and expects either a sleep duration in seconds or a raised JobTimeoutError

Termination

If both max_attempts and max_wait are set to None, the client will only stop polling when the job reaches a terminal status

Defaults
  • initial_interval=1.0s
  • max_interval=30.0s
  • multiplier=2.0
  • jitter=True
  • max_wait=600.0s (10 minutes)
  • max_attempts=None
Backoff Math
from scrape_do.async_api import PollingStrategy

strategy = PollingStrategy(
    initial_interval=1.0,  # sleep 1s before the first attempt
    max_interval=30.0,     # never sleep more than 30s per attempt
    multiplier=2.0,        # double the sleep after every attempt
    jitter=False,          # disable jitter so the math is exact
    max_wait=600.0,        # give up after 10 minutes total
    max_attempts=None      # don't cap by number of attempts
    )

# Resulting Sleep Times
#   attempt=0 -> min(1.0 * 2.0**0, 30.0) = 1.0s
#   attempt=1 -> min(1.0 * 2.0**1, 30.0) = 2.0s
#   attempt=2 -> min(1.0 * 2.0**2, 30.0) = 4.0s
#   attempt=3 -> min(1.0 * 2.0**3, 30.0) = 8.0s
#   attempt=4 -> min(1.0 * 2.0**4, 30.0) = 16.0s
#   attempt=5 -> min(1.0 * 2.0**5, 30.0) = 30.0s  (capped)
#   attempt=6 -> 30.0s  (capped from here on)

# With `jitter=True` (default) each value above is then
# multiplied by a fresh `random.uniform(0.5, 1.5)` per attempt so
# multiple clients polling the same job don't synchronize their
# `get_job` requests

# `max_wait` is wall-clock, not summed sleep. A 600s value with
# the schedule above tolerates ~36 attempts before raising
# `JobTimeoutError`

Attributes:

Name Type Description
initial_interval float

Seconds to sleep before the first re-check after a non-terminal status. Must be >= 0.1

max_interval float

Cap on the per-attempt sleep duration in seconds. Must be >= 0.1

multiplier float

Exponential growth factor between consecutive non-terminal statuses. Must be >= 1.0

jitter bool

When True, multiplies the computed interval by random.uniform(0.5, 1.5) to avoid synchronized polling across clients

max_wait Optional[float]

Maximum amount of time to spend polling before raising. None disables the budget check

max_attempts Optional[int]

Maximum amount of get_job calls before raising. None disables the attempt-count check

next_interval(attempt)

Returns the sleep duration in seconds for the given attempt

Backoff Math
  • Each attempt sleeps min(initial_interval * (multiplier ** attempt), max_interval)

  • The resulting value is multiplied by random.uniform(0.5, 1.5) when self.jitter=True

Overflow Safety
  • When multiplier > 1, multiplier ** attempt can overflow float if attempt is large enough

  • This method catches the OverflowError and caps the result to max_interval so that polling loops with max_wait=None and max_attempts=None can keep working until the job reaches a terminal status

Parameters:

Name Type Description Default
attempt int

Zero-indexed attempt counter

required

Returns:

Type Description
float

The sleep duration in seconds capped at max_interval, optionally jittered

__call__(attempt, elapsed, job)

Returns the next sleep duration or raises JobTimeoutError

Polling stops when the job reaches a terminal status or when self.max_wait / self.max_attempts is exhausted

Termination
  • When max_attempts is set and attempt + 1 >= max_attempts, raises JobTimeoutError

  • When max_wait is set and the next sleep would push cumulative elapsed time past max_wait, raises JobTimeoutError

  • Otherwise returns self.next_interval(attempt)

Parameters:

Name Type Description Default
attempt int

Zero-indexed attempt counter

required
elapsed float

Cumulative seconds spent polling so far

required
job JobDetails

The latest JobDetails snapshot returned by get_job

required

Returns:

Type Description
float

The sleep duration in seconds for the next polling cycle

Raises:

Type Description
JobTimeoutError

When the configured max_attempts or max_wait budget is exhausted

PollingFunction module-attribute

PollingFunction = Callable[
    [int, float, "JobDetails"], float
]

Defines the signature of the custom polling backoff function that can be passed to the client's wait_for_job and submit_and_wait methods

Arguments
  • attempt: intzero-indexed attempt counter. Represents the number of non-terminal get_job calls completed so far

  • elapsed: float → Cumulative wall-clock seconds spent polling since the loop started

  • job: JobDetails → The latest snapshot returned by get_job

Return Value

The function should only return the number of seconds to sleep before the next get_job call

The job Argument

The function accepts the JobDetails response for two reasons

  • It's useful for making the next sleep duration depend on the current job status, or other information returned by the server. For example, you might want a shorter delay when the status=rotating compared to when status=queued

  • It allows you to raise the detailed JobTimeoutError when breaking the polling loop, or to pass the job's information to your own custom exceptions

Termination
  • The Default PollingFunction used by the client uses the optional max_wait and max_attempts values to break the polling loop BEFORE the job reaches a terminal status to prevent the loop from executing indefinitely in case the job hangs

  • To mimic this behaviour when providing a custom PollingFunction, it should raise an exception, like the JobTimeoutError based on some sort of condition

  • This exception will be surfaced unchanged on a call to the wait_for_job or submit_and_wait methods of the client

  • If the custom PollingFunction never raises, the polling loop will only be broken when the job reaches a terminal status

Example of a Custom Polling Function
from scrape_do.async_api.models.response import JobDetails
from scrape_do.async_api.exceptions import JobTimeoutError

def custom_polling_function(
    attempt: int,
    elapsed: float,
    job: JobDetails
) -> float:

    # Stop polling after 2 minutes or 10 attempts
    # attempt is 0-index
    if elapsed >= 120 or attempt >= 9:
        raise JobTimeoutError(
            job_id=job.job_id,
            last_status=job.status,
            elapsed=elapsed,
            attempts=attempt + 1
            )

    # Sleep for a bit longer if job is queuing or rotating
    if job.status == "queuing" or job.status == "rotating":
        return (3.0 ** attempt)
    else:
        return  (2.0 ** attempt)