Skip to content

Models

models

Data models and API contracts

This module utilizes Pydantic V2 to enforce Scrape.do's complex routing rules, parameter dependencies, and geographical targeting constraints locally, ensuring that invalid configurations are caught before a network request is generated.

ClickAction

Bases: BaseModel

Executes a click event on a specified CSS selector.

Attributes:

Name Type Description
action Literal['Click']

The literal action identifier.

selector str

The CSS selector of the target element.

WaitAction

Bases: BaseModel

Pauses browser execution for a specific duration.

Attributes:

Name Type Description
action Literal['Wait']

The literal action identifier.

timeout int

Number of milliseconds to wait.

WaitSelectorAction

Bases: BaseModel

Pauses browser execution until a specific element appears in the DOM.

Attributes:

Name Type Description
action Literal['WaitSelector']

The literal action identifier.

wait_selector str

The CSS selector to wait for.

timeout int

Maximum time to wait in milliseconds. Defaults to None.

ScrollXAction

Bases: BaseModel

Scrolls the viewport horizontally.

Attributes:

Name Type Description
action Literal['ScrollX']

The literal action identifier.

value int

Number of pixels to scroll along the X-axis.

ScrollYAction

Bases: BaseModel

Scrolls the viewport vertically.

Attributes:

Name Type Description
action Literal['ScrollY']

The literal action identifier.

value int

Number of pixels to scroll along the Y-axis.

ScrollToAction

Bases: BaseModel

Scrolls the viewport until a specific element is visible.

Attributes:

Name Type Description
action Literal['ScrollTo']

The literal action identifier.

selector str

The CSS selector of the element to scroll to.

FillAction

Bases: BaseModel

Types a specified value into an input field.

Attributes:

Name Type Description
action Literal['Fill']

The literal action identifier.

selector str

The CSS selector of the input element.

value str

The text string to type into the element.

ExecuteAction

Bases: BaseModel

Executes arbitrary JavaScript within the browser context.

Attributes:

Name Type Description
action Literal['Execute']

The literal action identifier.

execute str

The raw JavaScript code to evaluate.

ScreenShotAction

Bases: BaseModel

Captures a screenshot during the execution of browser actions.

Attributes:

Name Type Description
action Literal['ScreenShot']

The literal action identifier.

full_screenshot bool

If True, captures the entire scrollable page.

particular_screenshot str

CSS selector of a specific element to capture.

validate_screenshot_logic()

Ensures mutually exclusive screenshot targeting parameters are not combined.

Capturing Full Screenshot And Particular Screenshot

A single screenshot action can either capture the entire scrollable page OR a specific DOM element, but not both simultaneously. To capture both, provide two separate ScreenShotAction objects in the play_with_browser list.

Returns:

Type Description
Self

The validated instance from which the method was called from

Raises:

Type Description
ValueError

If both full_screenshot and particular_screenshot are active.

WaitForRequestCompletionAction

Bases: BaseModel

Pauses execution until network requests matching a specific pattern complete.

Attributes:

Name Type Description
action Literal['WaitForRequestCompletion']

The literal action identifier.

url_pattern str

The regex or string pattern of the URL to wait for.

timeout int

Maximum time to wait in milliseconds before failing.

RequestParameters

Bases: BaseModel

The strict data contract for the request parameters accepted by Scrape.do's API.

This model enforces all parameter dependencies, mutually exclusive rules, and geographical targeting constraints locally before a network request is generated.

Attributes:

Name Type Description
url HttpUrl

The absolute destination URL you wish to scrape.

super bool

Activates Residential/Mobile IP proxies.

render bool

Executes the request using a headless browser.

device DeviceType

Specify the device type (desktop, mobile, tablet)

session_id int

Use the same IP address continuously with a session

geo_code str

ISO 3166-1 alpha-2 country code for IP targeting.

regional_geo_code RegionCodeType

Targets a broader geographical region. Requires super=True.

postal_code str

Targets a specific zip code. Requires super=True and a supported geo_code.

wait_until WaitUntilType

Control when the browser considers the page loaded

custom_wait int

Set the browser wait time on the target web page after content loaded

wait_selector str

CSS selector to wait for in the target web page.

width int

Custom viewport width.

height int

Custom viewport height.

return_json bool

Returns response body as base64-encoded JSON instead of raw HTML.

block_resources bool

Block CSS, images, and fonts on your target web page

screenshot bool

Captures the visible viewport.

full_screenshot bool

Captures the entire scrollable page.

particular_screenshot str

Captures a specific DOM element by selector.

play_with_browser List[BrowserAction]

A sequence of automated interactions to perform.

show_frames bool

Returns all iframe content from the target webpage. Requires render=true and returnJSON=true

show_websocket_requests bool

Captures WebSocket network traffic. Requires render=true and returnJSON=true.

custom_headers bool

Replaces Scrape.do's default headers with your provided headers.

extra_headers bool

Appends your provided headers to Scrape.do's default headers.

forward_headers bool

Forwards all headers exactly as sent by your client.

set_cookies str

Injects specific cookies into the request.

disable_redirection bool

Prevents the proxy from following 3xx HTTP redirects.

timeout int

Total API connection timeout in milliseconds.

retry_timeout int

Internal proxy retry duration in milliseconds. Cannot be used with render=True.

disable_retry bool

Fails immediately on target error without rotating IPs.

output OutputType

Output format parser.

transparent_response bool

Return pure response from target web page without Scrape.do processing

pure_cookies bool

Returns the original Set-Cookie headers from the target website

validate_compatibility()

Cross-validates parameter dependencies to prevent invalid API requests locally.

Headless Browser Dependencies (render=True)
  • wait_until
  • wait_selector
  • custom_wait
  • width
  • height
  • return_json
  • block_resources
  • screenshot
  • full_screenshot
  • particular_screenshot
  • play_with_browser
  • show_frames
  • show_websocket_requests
ReturnJSON Dependencies (render=True + return_json=True)
  • screenshot
  • full_screenshot
  • particular_screenshot
  • show_frames
  • show_websocket_requests
Super Proxy Dependencies (super=True)
  • regional_geo_code
Screenshot Parameters
  • Only one of the screenshot parameters can be set at a time.

  • In addition to render=True and return_json=True, all screenshot parameters require blockResources to be set to False.

Header Parameters
  • Only one of the header parameters can be set at a time.

  • None of the header parameters can be set to True when using the setCookies parameter

Mutually Exclusive Parameters
  • The playWithBrowser and particular_screenshot parameters cannot be used simultaneously

  • The retryTimeout and render parameters cannot be used simultaneously

  • The regional_geo_code and geo_code parameters cannot be used simultaneously

Returns:

Type Description
Self

The validated instance from which the method was called

Raises:

Type Description
ValueError

If mutually exclusive parameters are combined or if dependent parameters are provided without their required prerequisites.

validate_geo_code(v, info) classmethod

Validates the country code against the allowed proxy pools.

Parameters:

Name Type Description Default
cls Type[RequestParameters]

The RequestParameters model class

required
v str

The geo_code provided during initialization

required
info ValidationInfo

The data already validated for the model so far

required

Returns:

Type Description
Optional[str]

The validated geo_code parameter

Raises:

Type Description
ValueError

If the country code is not supported by the selected proxy tier.

validate_postal_code(v, info) classmethod

Validates postal codes based on specific regional formats.

Parameters:

Name Type Description Default
cls Type[RequestParameters]

The RequestParameters model class

required
v str

The postal_code provided during initialization

required
info ValidationInfo

The data already validated for the model so far

required

Returns:

Type Description
Optional[str]

The validated postal_code parameter

Raises:

Type Description
ValueError

If dependencies are missing or the format does not match the regional regex.

to_api_params()

Serializes the model into a dictionary formatted for httpx query parameters.

This method automatically drops unassigned fields, maps snake_case variables to their camelCase API equivalents, and stringifies nested JSON objects as required by Scrape.do.

Returns:

Type Description
Dict[str, Any]

A sanitized dictionary ready to be passed to httpx.

PreparedScrapeDoRequest

Bases: BaseModel

Represents a fully validated, ready-to-execute API call.

Payload Type
  • If payload_type='json', the body will be sent to httpx.request() through the json parameter

  • If payload_type='raw', the body will be sent to httpx.request() through the content parameter

  • If payload_type='form' the body will be sent to httpx.request() through the data parameter

Attributes:

Name Type Description
api_params RequestParameters

Validated parameters to pass to the API

method HttpMethod

HTTP method to forward to the target website

headers Dict[str, str]

Custom HTTP headers to forward

body Union[Dict[str, Any], str, bytes]

Payload to send to the target website (JSON dict, string, or bytes)

payload_type PayloadType

Dictates how httpx should encode the body. Defaults to 'json'.

cross_validate_http_components()

Cross-references standard HTTP request components (Method, Headers, Body) against the Scrape.do specific parameters to ensure the configuration will be respected by the proxy network.

Headers
  • Raises a ValueError if none of the header flags is set to true in RequestParameters and custom headers are provided

  • Raises a ValueError if one of the header flags are set to true in RequestParameters and no custom headers are provided

  • Raises a ValueError if RequestParameters.extra_headers is set to true and any of the provided headers don't start with the required sd- prefix.

Method
  • Raises a ValueError if RequestParameters.render is set to true and method=HEAD
Body
  • Emits a UserWarning if a body is provided and method=GET or method=HEAD

Returns:

Type Description
Self

The validated instance from which the method was called

Raises:

Type Description
ValueError

If any of the validation steps fails

to_httpx_kwargs()

Packages the validated object into a dictionary ready for httpx unpacking.

Returns:

Type Description
Dict[str, Any]

Keyword arguments strictly formatted for httpx.request().