Skip to content

Validators

validators

Reusable validation helpers for Scrape.do parameter contracts

This module exposes the cross-field and field-level rules enforced by RequestParameters as pure functions so they can be reused by other parameter models without inheriting from RequestParameters itself

Design Notes
  • Every helper is a free function

  • Callers pass in exactly the values each rule needs

  • Helpers either return the validated value on success or raise ValueError with a message on rule violation

check_geo_code(value, *, super_set)

Normalizes and validates an ISO 3166-1 alpha-2 country code against the different sets of countries allowed by Scrape.do when super=True and when super=False

Lowercases the input and checks it against _SUPER_SUPPORTED_COUNTRIES or _DATACENTER_SUPPORTED_COUNTRIES depending on the value of super_set

Parameters:

Name Type Description Default
value Optional[str]

Raw geo_code (any case). None short-circuits the check and is returned unchanged

required
super_set bool

Whether super=True was provided alongside this geo_code

required

Returns:

Type Description
Optional[str]

The lowercased geo_code on success, or None if value was None

Raises:

Type Description
ValueError

If the country code is not supported by the selected proxy tier. The message distinguishes between not supported at all and supported only on super

check_postal_code(value, *, super_set, geo_code)

Validates a postal code against the list of countries supported by Scrape.do for this parameter. In addition, it uses regex to check if the provided value matches the valid postal code format of that country

Validation Logic
  • Postal-code targeting requires both super=True AND a previously validated geo_code that belongs to _ZIPCODE_FORMATS

  • The value is stripped of surrounding whitespace before format matching

Parameters:

Name Type Description Default
value Optional[str]

Raw postal_code. None short-circuits the check and is returned unchanged

required
super_set bool

Whether super=True was provided alongside this postal_code

required
geo_code Optional[str]

The (already-validated) ISO 3166-1 alpha-2 country code accompanying this request

required

Returns:

Type Description
Optional[str]

The stripped postal_code on success, or None if value was None

Raises:

Type Description
ValueError

If super=False, if geo_code is missing, if the country has no zip-code format defined, or if the format regex does not match

check_geo_exclusion(geo_code, regional_geo_code)

Enforces mutual exclusivity between geo_code and regional_geo_code

Logic Behind Validation

Scrape.do's gateway rejects requests that specify both a country code and a regional code at the same time

Parameters:

Name Type Description Default
geo_code Optional[str]

The ISO 3166-1 alpha-2 country code on this request, or None

required
regional_geo_code Optional[RegionCodeType]

The RegionCodeType value on this request, or None

required

Raises:

Type Description
ValueError

If both arguments are non-None

check_regional_requires_super(super_set, regional_geo_code)

Enforces the regional_geo_code and super=True dependency

Logic Behind Validation

Scrape.do only routes regional codes through the premium proxy pool, so super must be explicitly enabled when a regional code is supplied

Parameters:

Name Type Description Default
super_set bool

Whether super=True was provided

required
regional_geo_code Optional[Any]

The RegionCodeType value on this request, or None

required

Raises:

Type Description
ValueError

If regional_geo_code is set while super=False

check_screenshot_mutual_exclusion(screenshot, full_screenshot, particular_screenshot)

Enforces that at most one of the three screenshot parameters is set per request

Logic Behind Validation

Scrape.do doesn't allow setting more than one screenshot parameter per request

Multiple Screenshots

To get multiple screenshot per requests, you can use the play_with_browser parameter to provide a list containing ScreenShotAction objects

Parameters:

Name Type Description Default
screenshot Optional[bool]

True to capture the visible viewport

required
full_screenshot Optional[bool]

True to capture the entire scrollable page

required
particular_screenshot Optional[str]

CSS selector targeting a specific element to screenshot

required

Raises:

Type Description
ValueError

If more than one of the three is truthy

check_screenshot_blocks_resources(screenshot, full_screenshot, particular_screenshot, block_resources)

Enforces that screenshots run with block_resources=False

Logic Beghind Validation
  • Any active screenshot parameter requires that resource blocking be disabled so that images, CSS, and fonts are actually loaded before capture

  • Combining a screenshot with block_resources=True might yield an empty/partially-rendered image, so Scrape.do automatically sets it to False regardless of the value sent for the parameter

Parameters:

Name Type Description Default
screenshot Optional[bool]

Visible-viewport screenshot flag

required
full_screenshot Optional[bool]

Full-page screenshot flag

required
particular_screenshot Optional[str]

Element-selector screenshot

required
block_resources Optional[bool]

True to block CSS / images / fonts

required

Raises:

Type Description
ValueError

If any screenshot parameter is truthy while block_resources is also True

check_return_json_dependencies(screenshot, full_screenshot, particular_screenshot, show_frames, show_websocket_requests, return_json)

Enforces the return_json=True dependency for response-only artifacts

Logic Behind Validation

Screenshots, iframe content, and websocket traces are delivered inside the structured JSON envelope, so they require both render=True AND return_json=True

Render Dependencies
  • The render-side requirement is enforced separately by check_render_dependencies

  • This helper only enforces the JSON-envelope side

Parameters:

Name Type Description Default
screenshot Optional[bool]

Visible-viewport screenshot flag

required
full_screenshot Optional[bool]

Full-page screenshot flag

required
particular_screenshot Optional[str]

Element-selector screenshot

required
show_frames Optional[bool]

Include all iframe content in the response

required
show_websocket_requests Optional[bool]

Capture websocket traffic in the response

required
return_json Optional[bool]

Whether return_json=True was set

required

Raises:

Type Description
ValueError

If any JSON-envelope-dependent field is truthy while return_json is not set

check_play_with_browser_vs_particular_screenshot(play_with_browser, particular_screenshot)

Enforces mutual exclusivity between play_with_browser and particular_screenshot

Logic Behind Validation

Element-selector screenshots are taken after navigation but cannot coexist with a scripted browser-action sequence in the same request

Use Particular Screenshot With playWithBrowser

To get a particular_screenshot while using the playWithBrowser parameter, you can use use a ScreenShotAction object with the particular_screenshot field set to the CSS selector you want to capture

Parameters:

Name Type Description Default
play_with_browser Optional[Sequence[BrowserAction]]

Sequence of BrowserAction entries or None

required
particular_screenshot Optional[str]

CSS selector targeting a specific element to screenshot, or None

required

Raises:

Type Description
ValueError

If both arguments are non-None

check_render_dependencies(render, dependent_fields)

Enforces the render=True dependency for headless-browser parameters

Logic Behind Validation

The wait_until, custom_wait, wait_selector, width, height, return_json, block_resources, screenshot, full_screenshot, particular_screenshot, play_with_browser, show_frames, show_websocket_requests require the headless-browser pipeline to be active

Usage

The caller must pass the actual field values keyed by name so that the error message can name the offending fields verbatim

Parameters:

Name Type Description Default
render Optional[bool]

Whether render=True was set

required
dependent_fields Dict[str, Any]

Map of render-dependent field names → their current values. Any field whose value is non-None is considered in use

required

Raises:

Type Description
ValueError

If any of the dependent fields is non-None while render is not truthy. The message lists every offending field name

check_retry_timeout_vs_render(render, retry_timeout)

Enforces mutual exclusivity between retry_timeout and render=True

Logic Behind Validation

According to the Scrape.do official documentation, the retry_timeout parameter does not work when set simultaneously with render=true

Parameters:

Name Type Description Default
render Optional[bool]

Whether render=True was set

required
retry_timeout Optional[int]

Internal proxy retry duration in milliseconds, or None

required

Raises:

Type Description
ValueError

If render=True and retry_timeout is not None

check_header_mutual_exclusion(custom_headers, extra_headers, forward_headers)

Enforces that at most one of the three header-control parameters is set

Logic Behind Validation
  • custom_headers, extra_headers, and forward_headers are mutually exclusive header-handling modes

  • Scrape.do's gateway will reject combinations

Parameters:

Name Type Description Default
custom_headers Optional[bool]

Replace Scrape.do's default headers with the user-provided ones

required
extra_headers Optional[bool]

Append the user-provided headers to Scrape.do's defaults

required
forward_headers Optional[bool]

Forward all headers exactly as sent by the client

required

Raises:

Type Description
ValueError

If more than one of the three is truthy

check_headers_vs_set_cookies(used_header_fields, set_cookies)

Enforces incompatibility between any header-control mode and set_cookies

Logic Behind Validation

Scrape.do does not accept setCookies alongside any of the customHeaders / extraHeaders / forwardHeaders flags

Parameters:

Name Type Description Default
used_header_fields List[str]

Names of header-control fields that are currently truthy. Pass [] if none are set

required
set_cookies Optional[str]

The set_cookies parameter value

required

Raises:

Type Description
ValueError

If used_header_fields is non-empty AND set_cookies is non-None