Models
models
¶
Data models and API contracts
This module utilizes Pydantic V2 to enforce Scrape.do's complex routing rules, parameter dependencies, and geographical targeting constraints locally, ensuring that invalid configurations are caught before a network request is generated.
ClickAction
¶
WaitAction
¶
WaitSelectorAction
¶
ScrollXAction
¶
ScrollYAction
¶
ScrollToAction
¶
FillAction
¶
ExecuteAction
¶
ScreenShotAction
¶
Bases: BaseModel
Captures a screenshot during the execution of browser actions.
Attributes:
| Name | Type | Description |
|---|---|---|
action |
Literal['ScreenShot']
|
The literal action identifier. |
full_screenshot |
bool
|
If True, captures the entire scrollable page. |
particular_screenshot |
str
|
CSS selector of a specific element to capture. |
validate_screenshot_logic()
¶
Ensures mutually exclusive screenshot targeting parameters are not combined.
Capturing Full Screenshot And Particular Screenshot
A single screenshot action can either capture the entire scrollable
page OR a specific DOM element, but not both simultaneously.
To capture both, provide two separate ScreenShotAction objects in
the play_with_browser list.
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called from |
Raises:
| Type | Description |
|---|---|
ValueError
|
If both |
WaitForRequestCompletionAction
¶
Bases: BaseModel
Pauses execution until network requests matching a specific pattern complete.
Attributes:
| Name | Type | Description |
|---|---|---|
action |
Literal['WaitForRequestCompletion']
|
The literal action identifier. |
url_pattern |
str
|
The regex or string pattern of the URL to wait for. |
timeout |
int
|
Maximum time to wait in milliseconds before failing. |
RequestParameters
¶
Bases: BaseModel
The strict data contract for the request parameters accepted by Scrape.do's API.
This model enforces all parameter dependencies, mutually exclusive rules, and geographical targeting constraints locally before a network request is generated.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
HttpUrl
|
The absolute destination URL you wish to scrape. |
super |
bool
|
Activates Residential/Mobile IP proxies. |
render |
bool
|
Executes the request using a headless browser. |
device |
DeviceType
|
Specify the device type (desktop, mobile, tablet) |
session_id |
int
|
Use the same IP address continuously with a session |
geo_code |
str
|
ISO 3166-1 alpha-2 country code for IP targeting. |
regional_geo_code |
RegionCodeType
|
Targets a broader geographical region. Requires super=True. |
postal_code |
str
|
Targets a specific zip code. Requires super=True and a supported geo_code. |
wait_until |
WaitUntilType
|
Control when the browser considers the page loaded |
custom_wait |
int
|
Set the browser wait time on the target web page after content loaded |
wait_selector |
str
|
CSS selector to wait for in the target web page. |
width |
int
|
Custom viewport width. |
height |
int
|
Custom viewport height. |
return_json |
bool
|
Returns response body as base64-encoded JSON instead of raw HTML. |
block_resources |
bool
|
Block CSS, images, and fonts on your target web page |
screenshot |
bool
|
Captures the visible viewport. |
full_screenshot |
bool
|
Captures the entire scrollable page. |
particular_screenshot |
str
|
Captures a specific DOM element by selector. |
play_with_browser |
List[BrowserAction]
|
A sequence of automated interactions to perform. |
show_frames |
bool
|
Returns all iframe content from the target webpage. Requires render=true and returnJSON=true |
show_websocket_requests |
bool
|
Captures WebSocket network traffic. Requires render=true and returnJSON=true. |
custom_headers |
bool
|
Replaces Scrape.do's default headers with your provided headers. |
extra_headers |
bool
|
Appends your provided headers to Scrape.do's default headers. |
forward_headers |
bool
|
Forwards all headers exactly as sent by your client. |
set_cookies |
str
|
Injects specific cookies into the request. |
disable_redirection |
bool
|
Prevents the proxy from following 3xx HTTP redirects. |
timeout |
int
|
Total API connection timeout in milliseconds. |
retry_timeout |
int
|
Internal proxy retry duration in milliseconds. Cannot be used with render=True. |
disable_retry |
bool
|
Fails immediately on target error without rotating IPs. |
output |
OutputType
|
Output format parser. |
transparent_response |
bool
|
Return pure response from target web page without Scrape.do processing |
pure_cookies |
bool
|
Returns the original Set-Cookie headers from the target website |
validate_compatibility()
¶
Cross-validates parameter dependencies to prevent invalid API requests locally.
Headless Browser Dependencies (render=True)
wait_untilwait_selectorcustom_waitwidthheightreturn_jsonblock_resourcesscreenshotfull_screenshotparticular_screenshotplay_with_browsershow_framesshow_websocket_requests
ReturnJSON Dependencies (render=True + return_json=True)
screenshotfull_screenshotparticular_screenshotshow_framesshow_websocket_requests
Super Proxy Dependencies (super=True)
regional_geo_code
Screenshot Parameters
-
Only one of the screenshot parameters can be set at a time.
-
In addition to
render=Trueandreturn_json=True, all screenshot parameters requireblockResourcesto be set to False.
Header Parameters
-
Only one of the header parameters can be set at a time.
-
None of the header parameters can be set to True when using the
setCookiesparameter
Mutually Exclusive Parameters
-
The
playWithBrowserandparticular_screenshotparameters cannot be used simultaneously -
The
retryTimeoutandrenderparameters cannot be used simultaneously -
The
regional_geo_codeandgeo_codeparameters cannot be used simultaneously
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutually exclusive parameters are combined or if dependent parameters are provided without their required prerequisites. |
validate_geo_code(v, info)
classmethod
¶
Validates the country code against the allowed proxy pools.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type[RequestParameters]
|
The RequestParameters model class |
required |
v
|
str
|
The |
required |
info
|
ValidationInfo
|
The data already validated for the model so far |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the country code is not supported by the selected proxy tier. |
validate_postal_code(v, info)
classmethod
¶
Validates postal codes based on specific regional formats.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cls
|
Type[RequestParameters]
|
The RequestParameters model class |
required |
v
|
str
|
The |
required |
info
|
ValidationInfo
|
The data already validated for the model so far |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If dependencies are missing or the format does not match the regional regex. |
to_api_params()
¶
Serializes the model into a dictionary formatted for httpx query parameters.
This method automatically drops unassigned fields, maps snake_case variables to their camelCase API equivalents, and stringifies nested JSON objects as required by Scrape.do.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A sanitized dictionary ready to be passed to httpx. |
PreparedScrapeDoRequest
¶
Bases: BaseModel
Represents a fully validated, ready-to-execute API call.
Payload Type
-
If
payload_type='json', thebodywill be sent tohttpx.request()through thejsonparameter -
If
payload_type='raw', thebodywill be sent tohttpx.request()through thecontentparameter -
If
payload_type='form'thebodywill be sent tohttpx.request()through thedataparameter
Attributes:
| Name | Type | Description |
|---|---|---|
api_params |
RequestParameters
|
Validated parameters to pass to the API |
method |
HttpMethod
|
HTTP method to forward to the target website |
headers |
Dict[str, str]
|
Custom HTTP headers to forward |
body |
Union[Dict[str, Any], str, bytes]
|
Payload to send to the target website (JSON dict, string, or bytes) |
payload_type |
PayloadType
|
Dictates how httpx should encode the body. Defaults to 'json'. |
cross_validate_http_components()
¶
Cross-references standard HTTP request components (Method, Headers, Body) against the Scrape.do specific parameters to ensure the configuration will be respected by the proxy network.
Headers
-
Raises a ValueError if none of the header flags is set to true in
RequestParametersand custom headers are provided -
Raises a ValueError if one of the header flags are set to true in
RequestParametersand no custom headers are provided -
Raises a ValueError if
RequestParameters.extra_headersis set to true and any of the provided headers don't start with the requiredsd-prefix.
Method
- Raises a ValueError if
RequestParameters.renderis set to true andmethod=HEAD
Body
- Emits a UserWarning if a
bodyis provided andmethod=GETormethod=HEAD
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the validation steps fails |