Parameters
parameters
¶
Core validation engine and configuration contracts.
Validates request data before the network layer to ensure that invalid configurations are caught locally without wasting network requests by using Pydantic V2 models to enforce Scrape.do's parameter dependencies and interactions
RequestParametersDict
¶
Bases: TypedDict
Provides strict IDE autocomplete and static type checking for **kwargs
dictionaries meant for the
RequestParameters model.
super
instance-attribute
¶
Activates Residential/Mobile IP proxies.
render
instance-attribute
¶
Executes the request using a headless browser.
device
instance-attribute
¶
Specify the device type (desktop, mobile, tablet)
session_id
instance-attribute
¶
Use the same IP address continuously with a session
geo_code
instance-attribute
¶
ISO 3166-1 alpha-2 country code for IP targeting.
regional_geo_code
instance-attribute
¶
Targets a broader geographical region. Requires super=True.
postal_code
instance-attribute
¶
Targets a specific zip code. Requires super=True and a supported geo_code.
wait_until
instance-attribute
¶
Control when the browser considers the page loaded
custom_wait
instance-attribute
¶
Set the browser wait time on the target web page after content loaded
wait_selector
instance-attribute
¶
CSS selector to wait for in the target web page.
width
instance-attribute
¶
Custom viewport width.
height
instance-attribute
¶
Custom viewport height.
return_json
instance-attribute
¶
Returns response body as base64-encoded JSON instead of raw HTML.
block_resources
instance-attribute
¶
Block CSS, images, and fonts on your target web page
screenshot
instance-attribute
¶
Captures the visible viewport.
full_screenshot
instance-attribute
¶
Captures the entire scrollable page.
particular_screenshot
instance-attribute
¶
Captures a specific DOM element by selector.
play_with_browser
instance-attribute
¶
A sequence of automated interactions to perform.
show_frames
instance-attribute
¶
Returns all iframe content from the target webpage. Requires render=true and returnJSON=true
show_websocket_requests
instance-attribute
¶
Captures WebSocket network traffic. Requires render=true and returnJSON=true.
custom_headers
instance-attribute
¶
Replaces Scrape.do's default headers with your provided headers.
extra_headers
instance-attribute
¶
Appends your provided headers to Scrape.do's default headers.
forward_headers
instance-attribute
¶
Forwards all headers exactly as sent by your client.
set_cookies
instance-attribute
¶
Injects specific cookies into the request.
disable_redirection
instance-attribute
¶
Prevents the proxy from following 3xx HTTP redirects.
timeout
instance-attribute
¶
Total API connection timeout in milliseconds.
retry_timeout
instance-attribute
¶
Internal proxy retry duration in milliseconds. Cannot be used with render=True.
disable_retry
instance-attribute
¶
Fails immediately on target error without rotating IPs.
output
instance-attribute
¶
Output format parser.
transparent_response
instance-attribute
¶
Return pure response from target web page without Scrape.do processing
pure_cookies
instance-attribute
¶
Returns the original Set-Cookie headers from the target website
RequestParameters
¶
Bases: BaseModel
The strict data contract for the request parameters accepted by Scrape.do's API.
This model enforces all parameter dependencies, mutually exclusive rules, and geographical targeting constraints locally before a network request is generated.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
HttpUrl
|
The absolute destination URL you wish to scrape. |
super |
Optional[bool]
|
Activates Residential/Mobile IP proxies. |
render |
Optional[bool]
|
Executes the request using a headless browser. |
device |
Optional[DeviceType]
|
Specify the device type (desktop, mobile, tablet) |
session_id |
Optional[int]
|
Use the same IP address continuously with a session |
geo_code |
Optional[str]
|
ISO 3166-1 alpha-2 country code for IP targeting. |
regional_geo_code |
Optional[RegionCodeType]
|
Targets a broader geographical region. Requires super=True. |
postal_code |
Optional[str]
|
Targets a specific zip code. Requires super=True and a supported geo_code. |
wait_until |
Optional[WaitUntilType]
|
Control when the browser considers the page loaded |
custom_wait |
Optional[int]
|
Set the browser wait time on the target web page after content loaded |
wait_selector |
Optional[str]
|
CSS selector to wait for in the target web page. |
width |
Optional[int]
|
Custom viewport width. |
height |
Optional[int]
|
Custom viewport height. |
return_json |
Optional[bool]
|
Returns response body as base64-encoded JSON instead of raw HTML. |
block_resources |
Optional[bool]
|
Block CSS, images, and fonts on your target web page |
screenshot |
Optional[bool]
|
Captures the visible viewport. |
full_screenshot |
Optional[bool]
|
Captures the entire scrollable page. |
particular_screenshot |
Optional[str]
|
Captures a specific DOM element by selector. |
play_with_browser |
Optional[List[BrowserAction]]
|
A sequence of automated interactions to perform. |
show_frames |
Optional[bool]
|
Returns all iframe content from the target webpage. Requires render=true and returnJSON=true |
show_websocket_requests |
Optional[bool]
|
Captures WebSocket network traffic. Requires render=true and returnJSON=true. |
custom_headers |
Optional[bool]
|
Replaces Scrape.do's default headers with your provided headers. |
extra_headers |
Optional[bool]
|
Appends your provided headers to Scrape.do's default headers. |
forward_headers |
Optional[bool]
|
Forwards all headers exactly as sent by your client. |
set_cookies |
Optional[str]
|
Injects specific cookies into the request. |
disable_redirection |
Optional[bool]
|
Prevents the proxy from following 3xx HTTP redirects. |
timeout |
Optional[int]
|
Total API connection timeout in milliseconds. |
retry_timeout |
Optional[int]
|
Internal proxy retry duration in milliseconds. Cannot be used with render=True. |
disable_retry |
Optional[bool]
|
Fails immediately on target error without rotating IPs. |
output |
Optional[OutputType]
|
Output format parser. |
transparent_response |
Optional[bool]
|
Return pure response from target web page without Scrape.do processing |
pure_cookies |
Optional[bool]
|
Returns the original Set-Cookie headers from the target website |
validate_compatibility()
¶
Cross-validates parameter dependencies to prevent invalid API requests locally.
Headless Browser Dependencies (render=True)
wait_untilwait_selectorcustom_waitwidthheightreturn_jsonblock_resourcesscreenshotfull_screenshotparticular_screenshotplay_with_browsershow_framesshow_websocket_requests
ReturnJSON Dependencies (render=True + return_json=True)
screenshotfull_screenshotparticular_screenshotshow_framesshow_websocket_requests
Super Proxy Dependencies (super=True)
regional_geo_code
Screenshot Parameters
-
Only one of the screenshot parameters can be set at a time.
-
In addition to
render=Trueandreturn_json=True, all screenshot parameters requireblockResourcesto be set to False.
Header Parameters
-
Only one of the header parameters can be set at a time.
-
None of the header parameters can be set to True when using the
setCookiesparameter
Mutually Exclusive Parameters
-
The
playWithBrowserandparticular_screenshotparameters cannot be used simultaneously -
The
retryTimeoutandrenderparameters cannot be used simultaneously -
The
regional_geo_codeandgeo_codeparameters cannot be used simultaneously
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
Raises:
| Type | Description |
|---|---|
ValueError
|
If mutually exclusive parameters are combined or if dependent parameters are provided without their required prerequisites. |
validate_geo_code(v, info)
classmethod
¶
Validates the country code against the allowed proxy pools.
Delegates to
check_geo_code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
Optional[str]
|
The |
required |
info
|
ValidationInfo
|
The data already validated for the model so far |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The validated (lowercased) |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the country code is not supported by the selected proxy tier. |
validate_postal_code(v, info)
classmethod
¶
Validates postal codes based on specific regional formats.
Delegates to
check_postal_code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
v
|
Optional[str]
|
The |
required |
info
|
ValidationInfo
|
The data already validated for the model so far |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The validated |
Raises:
| Type | Description |
|---|---|
ValueError
|
If dependencies are missing or the format does not match the regional regex. |
to_api_params()
¶
Serializes the model into a dictionary formatted for httpx query parameters.
This method automatically drops unassigned fields, maps snake_case variables to their camelCase API equivalents, and stringifies nested JSON objects as required by Scrape.do.
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A sanitized dictionary ready to be passed to httpx. |
from_url(api_url)
classmethod
¶
Instantiates a RequestParameters instance by parsing a raw
Scrape.do API URL string.
Accepted URLs
This method accepts both raw and encoded URLs by using
the urllib.parse.parse_qs and urllib.parse.unquote_plus
functions to normalize encoded URLs.
Browser Actions (playWithBrowser)
When providing a URL containing the playWithBrowser parameter,
make sure to use the json.dumps function to stringify the list
of dictionaries containing the entries. Both the raw and ecoded
URLs can be passed to this method afterwards.
API Token
This method ignores the &token= parameter containing the
Scrape.do API key, since its insertion is meant to be handled by
the ScrapeDoClient using either an initialization parameter, or
the SCRAPE_DO_API_KEY environment variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
api_url
|
str
|
The full Scrape.do endpoint
( |
required |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the value found in the |
Returns:
| Type | Description |
|---|---|
RequestParameters
|
The |
validate_proxy_params()
¶
Cross-validates specific Proxy Mode parameter dependencies to
prevent invalid proxy configurations from being set.
Intended Usage
Since Scrape.do's Proxy Mode has a few configuration quirks,
this method can be called on an already validated
RequestParameters instance to ensure that the current parameter
configuration is also valid when using it.
Proxy Mode Additional Validation
-
Proxy Modecan be used withcustomHeaders=false, but unlike when using the API, it automatically sets this parameter totrueif no value is provided. Therefore, setting eitherforwardHeaders=true, orextra_headers=truewithout explicitly settingcustomHeaders=falsewould result in a configuration error where more than one header parameter is provided. -
Additionally, the
customHeadersparameter must be explicitly set tofalsewhensetCookies=true. Not doing so would result in a configuration error wherecustomHeadersandsetCookiesare used simultaneously. -
Apart from these, all other parameter validations are already ensured by the
validate_compatibilitymodel validator
render=True
When using Proxy Mode with browser-automation tools, Scrape.do
does not recommend setting render=true, so in this case, this
method emits a UserWarning.
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the |
Returns:
| Type | Description |
|---|---|
Self
|
The validated instance from which the method was called |
to_proxy_url()
¶
Generates a Scrape.do Proxy connection string template.
Proxy Mode
Since Scrape.do's Proxy Mode accepts API parameters as the proxy
password, this method serializes the current configuration into
a valid connection string template that can later be formatted with
a valid API token.
Intended Usage
-
The
Proxy Clientsuse this method to generate the proxy connection URL -
It can also be used to construct
Proxy ModeURLs for browser automation libraries likePlaywrightorSeleniumwhile still benefiting from the strict validation provided by theRequestParametersmodel.
Formatting The Result
To turn the resulting template into a valid Proxy Mode URL, it
needs to be formatted with a Scrape.do API, for example
Proxy Mode Validation
This method calls validate_proxy_params to ensure
that a misconfigured Proxy Mode URL is not generated
Raises:
| Type | Description |
|---|---|
ValueError
|
If any of the |
Returns:
| Type | Description |
|---|---|
str
|
A string template formatted as:
|