Parameters
parameters
¶
Outbound request shapes for the Scrape.do Async API
Defines the pydantic models the ScrapeDoAsyncAPIClient uses to
construct the JSON bodies and query strings for q.scrape.do
Validation Parity
Every cross-field rule enforced on
RequestParameters
is mirrored here via the shared helpers in
scrape_do.models.validators
JobCreationRequestDict
¶
Bases: TypedDict
Strict IDE autocomplete + static type-checking for **kwargs
dictionaries that build a JobCreationRequest model
Usage
-
Consumed by the client's smart-routing kwargs path
-
Callers can use either an explicit
JobCreationRequestinstance or unpacked kwargs
Flat Render Fields
-
Every field on the nested
RenderParameterssub-model is exposed at the top level here -
When the client receives any of those keys it auto-builds a
RenderParametersinstance and assigns it toJobCreationRequest.render
targets
instance-attribute
¶
URLs to scrape (mutually exclusive with plugin)
method
instance-attribute
¶
HTTP method for each task (default GET)
body
instance-attribute
¶
HTTP request body for POST / PUT / PATCH
headers
instance-attribute
¶
Custom HTTP headers
geo_code
instance-attribute
¶
ISO 3166-1 alpha-2 country code
regional_geo_code
instance-attribute
¶
Regional code (requires super=True)
super
instance-attribute
¶
Use residential / mobile proxies
forward_headers
instance-attribute
¶
Forward only the provided headers without merging Scrape.do defaults
session_id
instance-attribute
¶
Sticky session ID (0-1000000)
device
instance-attribute
¶
Device class to emulate (desktop, mobile, tablet)
set_cookies
instance-attribute
¶
Cookies to attach to each task (cannot be combined with headers)
timeout
instance-attribute
¶
Total request timeout in milliseconds (5000-120000)
retry_timeout
instance-attribute
¶
Per-curl retry timeout in milliseconds (5000-55000).
(mutually exclusive with any render-related field)
disable_retry
instance-attribute
¶
Disable Scrape.do's internal retry on non-2xx target responses
transparent_response
instance-attribute
¶
Forward the target's actual status code instead of wrapping non-2xx in a
502
disable_redirection
instance-attribute
¶
Disable following redirects
output
instance-attribute
¶
Output format (raw or markdown) (default raw)
webhook_url
instance-attribute
¶
Webhook URL to deliver results to
webhook_headers
instance-attribute
¶
Additional headers to send with the webhook delivery
plugin
instance-attribute
¶
Plugin job spec (mutually exclusive with targets)
block_resources
instance-attribute
¶
Render: Block loading of resources (cannot combine with
play_with_browser or screenshots)
wait_until
instance-attribute
¶
Render: Event to wait for during page load
custom_wait
instance-attribute
¶
Render: Custom wait time in milliseconds (0-35000)
wait_selector
instance-attribute
¶
Render: CSS selector to wait for
play_with_browser
instance-attribute
¶
Render: Sequence of browser actions to run after the page loads
return_json
instance-attribute
¶
Render: Return the response as a JSON envelope rather than raw target body
show_websocket_requests
instance-attribute
¶
Render: Include captured WebSocket requests (requires return_json)
show_frames
instance-attribute
¶
Render: Include iframe metadata (requires return_json)
screenshot
instance-attribute
¶
Render: Capture a screenshot of the visible viewport
full_screenshot
instance-attribute
¶
Render: Capture a screenshot of the entire scrollable page
particular_screenshot
instance-attribute
¶
Render: CSS selector targeting a specific element to screenshot
render
instance-attribute
¶
Pre-built RenderParameters model. When set, any flat RenderParameters
field is ignored
RenderParameters
¶
Bases: BaseModel
Nested Render object on a Scrape.do Async API job creation
payload
Source
Field list and bounds come from the official TypeScript
definition for POST /api/v1/jobs
Attributes:
| Name | Type | Description |
|---|---|---|
block_resources |
Optional[bool]
|
Block loading of resources
(cannot combine with |
wait_until |
Optional[WaitUntilType]
|
Event to wait for during
page load ( |
custom_wait |
Optional[int]
|
Custom wait time in milliseconds
( |
wait_selector |
Optional[str]
|
CSS selector to wait for |
play_with_browser |
Optional[List[BrowserAction]]
|
Sequence of browser actions to run after the page loads |
return_json |
Optional[bool]
|
Return the response as a JSON envelope rather than the raw target body |
show_websocket_requests |
Optional[bool]
|
Include captured
WebSocket requests in the response (requires |
show_frames |
Optional[bool]
|
Include iframe metadata in the
response (requires |
screenshot |
Optional[bool]
|
Capture a screenshot of the visible viewport |
full_screenshot |
Optional[bool]
|
Capture a screenshot of the entire scrollable page |
particular_screenshot |
Optional[str]
|
CSS selector targeting a specific element to screenshot |
_validate_render_compatibility()
¶
Cross-validates the screenshot / return-json / play-with-browser
dependencies inside the Render object
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the documented mutual-exclusion or dependency rules is violated |
JobCreationRequest
¶
Bases: BaseModel
Body for POST /api/v1/jobs
Driver
-
Exactly ONE of
targetsorpluginmust be set -
Setting both or neither raises
ValueErrorat construction time
Set Cookies vs Headers
-
set_cookiescannot be combined withheadersper the official TypeScript definition -
Setting both raises
ValueError
Plugin Geocode
-
When
plugin.paramsincludes any entry with ageocodefield set, the top-levelgeo_codeMUST beNone -
The per-task geocode wins server-side and a top-level value would conflict
Attributes:
| Name | Type | Description |
|---|---|---|
targets |
Optional[List[HttpUrl]]
|
URLs to scrape (mutually
exclusive with |
method |
Optional[HttpMethod]
|
HTTP method for each task
(default |
body |
Optional[str]
|
HTTP request body for |
headers |
Optional[Dict[str, str]]
|
Custom HTTP headers |
geo_code |
Optional[str]
|
ISO 3166-1 alpha-2 country code |
regional_geo_code |
Optional[RegionCodeType]
|
Regional code
(requires |
super |
Optional[bool]
|
Use residential / mobile proxies |
forward_headers |
Optional[bool]
|
Forward only the provided headers without merging Scrape.do defaults |
session_id |
Optional[int]
|
Sticky session ID ( |
device |
Optional[DeviceType]
|
Device class to emulate
( |
set_cookies |
Optional[str]
|
Cookies to attach to each task
(cannot be combined with |
timeout |
Optional[int]
|
Total request timeout in
milliseconds ( |
retry_timeout |
Optional[int]
|
Per-curl retry timeout in
milliseconds ( |
disable_retry |
Optional[bool]
|
Disable Scrape.do's internal retry on non-2xx target responses |
transparent_response |
Optional[bool]
|
Forward the target's
actual status code instead of wrapping non-2xx in a |
disable_redirection |
Optional[bool]
|
Disable following redirects |
output |
Optional[OutputType]
|
Output format ( |
render |
Optional[RenderParameters]
|
Browser rendering options |
webhook_url |
Optional[HttpUrl]
|
Webhook URL to deliver results to |
webhook_headers |
Optional[Dict[str, str]]
|
Additional headers to send with the webhook delivery |
plugin |
Optional[AsyncPlugin]
|
Plugin job spec (mutually
exclusive with |
_validate_geo_code(v, info)
classmethod
¶
Delegates to
check_geo_code
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
Optional[str]
|
The value provided to the |
required |
info
|
ValidationInfo
|
The data already validated for the model so far |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The validated (lowercased) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the country code is not supported by the selected proxy tier |
_serialize_session_id(value)
¶
Coerces session_id to a string on outbound JSON
String Coercion
-
The Async API's TypeScript definition documents
SessionIDas astring -
The server still expects an integer between
0-1000000in string form, so the Python type staysintand this serializer coerces it into a string
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
Optional[int]
|
The value provided to |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The coerced |
_validate_compatibility()
¶
Cross-validates parameter dependencies before any network round trip
Driver
Exactly one of targets / plugin must be set
Headers vs Set Cookies
set_cookies cannot be combined with headers
Render vs Retry Timeout
retry_timeout and render are mutually exclusive (same
rule as on RequestParameters)
Geo Exclusion
geo_code and regional_geo_code are mutually exclusive
Regional Requires Super
regional_geo_code requires super=True
Plugin Geocode
When plugin.params includes any entry with a geocode
field set, the top-level geo_code must be None
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the documented rules is violated |
JobListQueryParametersDict
¶
Bases: TypedDict
Strict IDE autocomplete + static type-checking for **kwargs
dictionaries that build a JobListQueryParameters model
Usage
-
Consumed by the client's smart-routing kwargs path on
list_jobs -
Callers can use either an explicit
JobListQueryParametersinstance or unpacked kwargs
page_size
instance-attribute
¶
Number of jobs per page (1-100, default 10)
page
instance-attribute
¶
Page number (>= 1, default 1)
status
instance-attribute
¶
Filter results to one JobStatus
start_from
instance-attribute
¶
Return jobs with start_time >= start_from. Serialized as RFC3339
with uppercase T and Z (UTC)
start_to
instance-attribute
¶
Return jobs with start_time <= start_to. Same RFC3339 formatting
sort
instance-attribute
¶
Sort order (start_time_asc or start_time_desc)
JobListQueryParameters
¶
Bases: BaseModel
Query parameters for GET /api/v1/jobs
Query Parameter Casing
Unlike the JSON body fields (which use PascalCase), the
server requires list-jobs query parameters to be lower-snake-case
Attributes:
| Name | Type | Description |
|---|---|---|
page_size |
Optional[int]
|
Number of jobs per page ( |
page |
Optional[int]
|
Page number ( |
status |
Optional[JobStatus]
|
Filter results to one
|
start_from |
Optional[datetime]
|
Return jobs with |
start_to |
Optional[datetime]
|
Return jobs with |
sort |
Optional[JobListQuerySortType]
|
Sort order ( |
_format_datetime(value)
staticmethod
¶
Renders a datetime as RFC3339 UTC with uppercase T / Z
Timezone
-
datetimeobjects that don't contain timezone information are treated as UTC -
datetimeobject that contain timezone information are converted toUTC
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
datetime
|
The datetime to format. Naive values are treated as UTC. Aware values are converted to UTC |
required |
Returns:
| Type | Description |
|---|---|
str
|
The formatted timestamp string |
RENDER_PARAMETER_FIELDS
module-attribute
¶
RENDER_PARAMETER_FIELDS = frozenset(
{
"block_resources",
"wait_until",
"custom_wait",
"wait_selector",
"play_with_browser",
"return_json",
"show_websocket_requests",
"show_frames",
"screenshot",
"full_screenshot",
"particular_screenshot",
}
)
Set of field names in JobCreationRequestDict that
belong to the nested RenderParameters sub-model
Usage
Consumed by the client's smart-routing kwargs path to separate
render-specific fields from the rest of the
JobCreationRequest body