Skip to content

Parameters

parameters

Outbound request shapes for the Scrape.do Async API

Defines the pydantic models the ScrapeDoAsyncAPIClient uses to construct the JSON bodies and query strings for q.scrape.do

Validation Parity

Every cross-field rule enforced on RequestParameters is mirrored here via the shared helpers in scrape_do.models.validators

JobCreationRequestDict

Bases: TypedDict

Strict IDE autocomplete + static type-checking for **kwargs dictionaries that build a JobCreationRequest model

Usage
  • Consumed by the client's smart-routing kwargs path

  • Callers can use either an explicit JobCreationRequest instance or unpacked kwargs

Flat Render Fields
  • Every field on the nested RenderParameters sub-model is exposed at the top level here

  • When the client receives any of those keys it auto-builds a RenderParameters instance and assigns it to JobCreationRequest.render

targets instance-attribute

URLs to scrape (mutually exclusive with plugin)

method instance-attribute

HTTP method for each task (default GET)

body instance-attribute

HTTP request body for POST / PUT / PATCH

headers instance-attribute

Custom HTTP headers

geo_code instance-attribute

ISO 3166-1 alpha-2 country code

regional_geo_code instance-attribute

Regional code (requires super=True)

super instance-attribute

Use residential / mobile proxies

forward_headers instance-attribute

Forward only the provided headers without merging Scrape.do defaults

session_id instance-attribute

Sticky session ID (0-1000000)

device instance-attribute

Device class to emulate (desktop, mobile, tablet)

set_cookies instance-attribute

Cookies to attach to each task (cannot be combined with headers)

timeout instance-attribute

Total request timeout in milliseconds (5000-120000)

retry_timeout instance-attribute

Per-curl retry timeout in milliseconds (5000-55000). (mutually exclusive with any render-related field)

disable_retry instance-attribute

Disable Scrape.do's internal retry on non-2xx target responses

transparent_response instance-attribute

Forward the target's actual status code instead of wrapping non-2xx in a 502

disable_redirection instance-attribute

Disable following redirects

output instance-attribute

Output format (raw or markdown) (default raw)

webhook_url instance-attribute

Webhook URL to deliver results to

webhook_headers instance-attribute

Additional headers to send with the webhook delivery

plugin instance-attribute

Plugin job spec (mutually exclusive with targets)

block_resources instance-attribute

Render: Block loading of resources (cannot combine with play_with_browser or screenshots)

wait_until instance-attribute

Render: Event to wait for during page load

custom_wait instance-attribute

Render: Custom wait time in milliseconds (0-35000)

wait_selector instance-attribute

Render: CSS selector to wait for

play_with_browser instance-attribute

Render: Sequence of browser actions to run after the page loads

return_json instance-attribute

Render: Return the response as a JSON envelope rather than raw target body

show_websocket_requests instance-attribute

Render: Include captured WebSocket requests (requires return_json)

show_frames instance-attribute

Render: Include iframe metadata (requires return_json)

screenshot instance-attribute

Render: Capture a screenshot of the visible viewport

full_screenshot instance-attribute

Render: Capture a screenshot of the entire scrollable page

particular_screenshot instance-attribute

Render: CSS selector targeting a specific element to screenshot

render instance-attribute

Pre-built RenderParameters model. When set, any flat RenderParameters field is ignored

RenderParameters

Bases: BaseModel

Nested Render object on a Scrape.do Async API job creation payload

Source

Field list and bounds come from the official TypeScript definition for POST /api/v1/jobs

Attributes:

Name Type Description
block_resources Optional[bool]

Block loading of resources (cannot combine with play_with_browser or screenshots)

wait_until Optional[WaitUntilType]

Event to wait for during page load (domcontentloaded, networkidle0, networkidle2, load)

custom_wait Optional[int]

Custom wait time in milliseconds (0-35000)

wait_selector Optional[str]

CSS selector to wait for

play_with_browser Optional[List[BrowserAction]]

Sequence of browser actions to run after the page loads

return_json Optional[bool]

Return the response as a JSON envelope rather than the raw target body

show_websocket_requests Optional[bool]

Include captured WebSocket requests in the response (requires return_json)

show_frames Optional[bool]

Include iframe metadata in the response (requires return_json)

screenshot Optional[bool]

Capture a screenshot of the visible viewport

full_screenshot Optional[bool]

Capture a screenshot of the entire scrollable page

particular_screenshot Optional[str]

CSS selector targeting a specific element to screenshot

_validate_render_compatibility()

Cross-validates the screenshot / return-json / play-with-browser dependencies inside the Render object

Returns:

Type Description
Self

The validated instance from which the method was called

Raises:

Type Description
ValueError

If any of the documented mutual-exclusion or dependency rules is violated

JobCreationRequest

Bases: BaseModel

Body for POST /api/v1/jobs

Driver
  • Exactly ONE of targets or plugin must be set

  • Setting both or neither raises ValueError at construction time

Set Cookies vs Headers
  • set_cookies cannot be combined with headers per the official TypeScript definition

  • Setting both raises ValueError

Plugin Geocode
  • When plugin.params includes any entry with a geocode field set, the top-level geo_code MUST be None

  • The per-task geocode wins server-side and a top-level value would conflict

Attributes:

Name Type Description
targets Optional[List[HttpUrl]]

URLs to scrape (mutually exclusive with plugin)

method Optional[HttpMethod]

HTTP method for each task (default GET)

body Optional[str]

HTTP request body for POST / PUT / PATCH

headers Optional[Dict[str, str]]

Custom HTTP headers

geo_code Optional[str]

ISO 3166-1 alpha-2 country code

regional_geo_code Optional[RegionCodeType]

Regional code (requires super=True)

super Optional[bool]

Use residential / mobile proxies

forward_headers Optional[bool]

Forward only the provided headers without merging Scrape.do defaults

session_id Optional[int]

Sticky session ID (0-1000000). Serialized to a string on outbound JSON to match the documented wire shape

device Optional[DeviceType]

Device class to emulate (desktop, mobile, tablet)

set_cookies Optional[str]

Cookies to attach to each task (cannot be combined with headers)

timeout Optional[int]

Total request timeout in milliseconds (5000-120000)

retry_timeout Optional[int]

Per-curl retry timeout in milliseconds (5000-55000; mutually exclusive with render)

disable_retry Optional[bool]

Disable Scrape.do's internal retry on non-2xx target responses

transparent_response Optional[bool]

Forward the target's actual status code instead of wrapping non-2xx in a 502

disable_redirection Optional[bool]

Disable following redirects

output Optional[OutputType]

Output format (raw or markdown; default raw)

render Optional[RenderParameters]

Browser rendering options

webhook_url Optional[HttpUrl]

Webhook URL to deliver results to

webhook_headers Optional[Dict[str, str]]

Additional headers to send with the webhook delivery

plugin Optional[AsyncPlugin]

Plugin job spec (mutually exclusive with targets)

_validate_geo_code(v, info) classmethod

Delegates to check_geo_code

Parameters:

Name Type Description Default
v Optional[str]

The value provided to the geo_code argument

required
info ValidationInfo

The data already validated for the model so far

required

Returns:

Type Description
Optional[str]

The validated (lowercased) geo_code

Raises:

Type Description
ValueError

If the country code is not supported by the selected proxy tier

_serialize_session_id(value)

Coerces session_id to a string on outbound JSON

String Coercion
  • The Async API's TypeScript definition documents SessionID as a string

  • The server still expects an integer between 0-1000000 in string form, so the Python type stays int and this serializer coerces it into a string

Parameters:

Name Type Description Default
value Optional[int]

The value provided to session_id during initialization

required

Returns:

Type Description
Optional[str]

The coerced session_id string, or None if session_id was None

_validate_compatibility()

Cross-validates parameter dependencies before any network round trip

Driver

Exactly one of targets / plugin must be set

Headers vs Set Cookies

set_cookies cannot be combined with headers

Render vs Retry Timeout

retry_timeout and render are mutually exclusive (same rule as on RequestParameters)

Geo Exclusion

geo_code and regional_geo_code are mutually exclusive

Regional Requires Super

regional_geo_code requires super=True

Plugin Geocode

When plugin.params includes any entry with a geocode field set, the top-level geo_code must be None

Returns:

Type Description
Self

The validated instance from which the method was called

Raises:

Type Description
ValueError

If any of the documented rules is violated

JobListQueryParametersDict

Bases: TypedDict

Strict IDE autocomplete + static type-checking for **kwargs dictionaries that build a JobListQueryParameters model

Usage
  • Consumed by the client's smart-routing kwargs path on list_jobs

  • Callers can use either an explicit JobListQueryParameters instance or unpacked kwargs

page_size instance-attribute

Number of jobs per page (1-100, default 10)

page instance-attribute

Page number (>= 1, default 1)

status instance-attribute

Filter results to one JobStatus

start_from instance-attribute

Return jobs with start_time >= start_from. Serialized as RFC3339 with uppercase T and Z (UTC)

start_to instance-attribute

Return jobs with start_time <= start_to. Same RFC3339 formatting

sort instance-attribute

Sort order (start_time_asc or start_time_desc)

JobListQueryParameters

Bases: BaseModel

Query parameters for GET /api/v1/jobs

Query Parameter Casing

Unlike the JSON body fields (which use PascalCase), the server requires list-jobs query parameters to be lower-snake-case

Attributes:

Name Type Description
page_size Optional[int]

Number of jobs per page (1-100)

page Optional[int]

Page number (>= 1)

status Optional[JobStatus]

Filter results to one JobStatus

start_from Optional[datetime]

Return jobs with start_time >= start_from. Serialized as RFC3339 with uppercase T and Z (UTC)

start_to Optional[datetime]

Return jobs with start_time <= start_to. Same RFC3339 formatting

sort Optional[JobListQuerySortType]

Sort order (start_time_asc or start_time_desc)

_format_datetime(value) staticmethod

Renders a datetime as RFC3339 UTC with uppercase T / Z

Timezone
  • datetime objects that don't contain timezone information are treated as UTC

  • datetime object that contain timezone information are converted to UTC

Parameters:

Name Type Description Default
value datetime

The datetime to format. Naive values are treated as UTC. Aware values are converted to UTC

required

Returns:

Type Description
str

The formatted timestamp string

to_query_dict()

Serializes the model into a flat dict suitable for the httpx.request params argument

Serialization Rules
  • None values are dropped
  • datetime values are formatted as RFC3339 UTC strings

Returns:

Type Description
Dict[str, Any]

A flat Dict[str, Any] ready to pass as params=...

RENDER_PARAMETER_FIELDS module-attribute

RENDER_PARAMETER_FIELDS = frozenset(
    {
        "block_resources",
        "wait_until",
        "custom_wait",
        "wait_selector",
        "play_with_browser",
        "return_json",
        "show_websocket_requests",
        "show_frames",
        "screenshot",
        "full_screenshot",
        "particular_screenshot",
    }
)

Set of field names in JobCreationRequestDict that belong to the nested RenderParameters sub-model

Usage

Consumed by the client's smart-routing kwargs path to separate render-specific fields from the rest of the JobCreationRequest body