Response
response
¶
Custom data models for the Scrape.do's API HTTP response
Encapsulates the httpx.Response object to provide a strongly-typed interface for the respone data sent back by the Scrape.do API. It Parses nested JSON payloads, extracts proxy telemetry, and attempts to determine whether non-2xx responses are coming from the target website, or from Scrape.do's gateway failures
ScrapeDoNetworkRequest
¶
Bases: BaseModel
Represents an intercepted HTTP network request made by the headless browser
When rendering JavaScript, the browser makes subsequent requests to fetch
CSS, images, and background API data which Scrape.do returns in the
networkRequests field when returnJSON=true
url
The URL is stored as a plain string (not HttpUrl) because
real-world pages embed iframes with technically-valid-but-quirky
URLs that pydantic-core's URL parser rejects
Attributes:
| Name | Type | Description |
|---|---|---|
url |
str
|
The absolute URL of the requested resource, as reported by Scrape.do |
method |
str
|
The HTTP method used (e.g., GET, POST) |
status |
int
|
The HTTP status code returned by the resource server |
request_headers |
Dict[str, str]
|
The headers sent by the headless browser |
request_body |
Optional[str]
|
The payload sent with the request, if any |
response_body |
Optional[str]
|
The payload returned by the server, if captured |
response_headers |
Dict[str, str]
|
The headers returned by the resource server |
ScrapeDoWebSocketFrame
¶
Bases: BaseModel
Represents the underlying payload of an intercepted WebSocket message
Attributes:
| Name | Type | Description |
|---|---|---|
opcode |
int
|
The WebSocket frame operation code (1 for text, 2 for binary) |
mask |
bool
|
Indicates if the payload data is masked. |
payload_data |
str
|
The actual message content transferred over the socket |
ScrapeDoWebSocketEvent
¶
Bases: BaseModel
Represents the Chrome DevTools Protocol (CDP) event metadata for a WebSocket
Attributes:
| Name | Type | Description |
|---|---|---|
request_id |
str
|
The unique identifier for this specific WebSocket connection |
timestamp |
float
|
The exact epoch timestamp when the event occurred. |
response |
ScrapeDoWebSocketFrame
|
The underlying frame containing the payload |
ScrapeDoWebsocketRequest
¶
Bases: BaseModel
Represents a complete WebSocket message intercepted during rendering
Attributes:
| Name | Type | Description |
|---|---|---|
type |
str
|
The direction of the traffic (e.g., "sent" or "received") |
event |
ScrapeDoWebSocketEvent
|
The raw DevTools Protocol event data |
ScrapeDoActionResult
¶
Bases: BaseModel
Represents the execution outcome of a specific programmatic browser action
Attributes:
| Name | Type | Description |
|---|---|---|
action |
str
|
The name of the action executed (e.g., "Click", "Wait") |
index |
int
|
The sequence index of this action in the original request array |
success |
bool
|
Indicates whether the action completed without throwing an error |
error |
Optional[str]
|
The error message if the action failed |
response |
Optional[Union[Dict[str, Any], str]]
|
Data returned by the
action, typically populated when using the |
ScrapeDoScreenshot
¶
Bases: BaseModel
Represents a captured screenshot generated during the scraping process
Attributes:
| Name | Type | Description |
|---|---|---|
screenshot_type |
str
|
The configuration used (e.g., "FullScreenShot") |
b64_image |
Optional[str]
|
The Base64 encoded string of the PNG image data |
error |
Optional[str]
|
The failure reason if the screenshot could not be captured |
to_bytes()
¶
Convenience method to convert the b64_image string into a bytes
object using the base64 standard python library
Raises:
| Type | Description |
|---|---|
ValueError
|
If the instance's |
Returns:
| Type | Description |
|---|---|
bytes
|
bytes object retuned by |
to_file(path)
¶
Convenience method to save the base64-encoded screenshot
File Type
Scrape.do returns base64-encoded .png image data, so path
should end in /file_name.png
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Union[str, PathLike]
|
Image file will be saved to this path |
required |
Returns:
| Type | Description |
|---|---|
Path
|
resolved |
ScrapeDoFrame
¶
Bases: BaseModel
Represents an isolated, cross-origin iframe discovered on the target webpage.
url
The URL is stored as a plain string (not HttpUrl) because
real-world pages embed iframes with technically-valid-but-quirky
URLs that pydantic-core's URL parser rejects
Attributes:
| Name | Type | Description |
|---|---|---|
url |
str
|
The absolute source URL of the iframe, as reported by Scrape.do |
content |
Optional[str]
|
The rendered HTML content inside the iframe. |
ScrapeDoResponse
¶
A unified data model for all HTTP responses returned by the Scrape.do API
This model encapsulates the underlying HTTPX network response to provide a flexible, strongly-typed interface
Different Response Types
Because Scrape.do alters its response format based on the request parameters, this model attempts to route property access to the correct underlying data source
Additional Infomartion
The following are some of the parameters that change the format of the HTTP response returned by Scrape.do
-
return_json=True: Returns a JSON string containing information about the request instead of the target website's raw HTML -
transparent_response=True: Causes the HTTP response returned by Scrape.do to mirror the exact status code of the HTTP response it got from the target website -
pure_cookies=True: Tells Scrpe.do to return the originalSet-Cookieheaders it got from the target website instead of bundling them into itsscrape.do-cookiesresponse header
is_proxy_error
cached
property
¶
Heuristic to determine whether a non-2xx status code error is coming directly from the target website, or whether it's coming from the Scrape.do gateway
Additional Information
Scrape.do usually sends JSON error messages when there's an
infrastructure error, so we try to parse the response's payload
as JSON regardless of whether or not return_json=True
-
IF
Payload Is Parsable JSON:-
Check if the returned JSON contatins one of the standard error keys (
message,Error,detail,Message, orerrorMessage). If it does, then the error is coming from Scrape.do, so returnTrue -
Otherwise, check if the returned JSON contains the
statusCodekey. If it does, and its value matches the status code returned by the original httpx response, then the error is probably coming from thetarget website, so returnFalse -
If the value doesn't match or the
statusCodekey is missing, fallback toPayload Is Not Parsable JSONlogic
-
-
IF
Payload Is Not Parsable JSON:- Scrape.do sends telemetry headers when a request is
successfuly completed, so if the response has the
scrape.do-intial-status-codeheader and its value is not empty, the error is probably coming from thetarget website, so returnFalse. Otherwise, it's probably a Scrape.do error, so returnTrue
- Scrape.do sends telemetry headers when a request is
successfuly completed, so if the response has the
transparent_response=True
When trasparent_response=True, Scrape.do can still send its
own error status codes when there's an infrastructure failure, so
we can't rely on the scrape_do_status_code to determine where
the error is coming from. With this in mind, this method aims
to provide a solution by analysing the response's structure as a
whole
Returns:
| Type | Description |
|---|---|
bool
|
|
httpx_response
property
¶
Exposes the raw, underlying HTTPX response
Intended Usage
Accessing this bypasses all SDK normalization. It's provided as an escape hatch for specific use cases where the original response object is needed
Returns:
| Type | Description |
|---|---|
Response
|
The raw httpx response object |
status_code
property
¶
Convenience accessor for the underlying HTTPX response status code
Equivalent to response.httpx_response.status_code. Distinct from
target_status_code and scrape_do_status_code, which interpret the
Scrape.do response envelope
Returns:
| Type | Description |
|---|---|
int
|
The HTTP status code of the response received from |
request
property
¶
Exposes the original, validated request configuration
Returns:
| Type | Description |
|---|---|
PreparedScrapeDoRequest
|
The |
scrape_do_status_code
property
¶
target_status_code
property
¶
The HTTP status code returned by the destination website
Additional Information
-
If
self.is_proxy_error=True, the target website was never reached, so returnNone -
If
transparent_response=True, the original status code from the httpx response is returned -
If
return_json=True, thestatusCodefield from the response's JSON is returned -
If it's not a proxy error, and both parameters are set to false, the
ScrapeDoResponse.initial_status_codeproperty value is returned
Returns:
| Type | Description |
|---|---|
Optional[int]
|
The target website's status code (e.g., 200, 403, 404). |
text
property
¶
The primary textual payload of the target website
Additional Information
Depending on the request parameters, this will return
either the raw HTML byte stream or the extracted content string
from within Scrape.do's JSON wrapper
Returns:
| Type | Description |
|---|---|
str
|
The HTML or JSON string payload from the target |
target_headers
property
¶
The HTTP headers returned by the destination server
Additional Information
This property automatically filters all internal scrape.do- proxy
telemetry headers, providing a clean representation of
the target's response
Returns:
| Type | Description |
|---|---|
Headers
|
The filtered headers from the target website |
scrape_do_headers
property
¶
request_cost
property
¶
initial_status_code
property
¶
request_id
property
¶
resolved_url
property
¶
target_url
property
¶
auth
property
¶
rate
property
¶
remaining_credits
property
¶
rid
property
¶
The specific proxy node Routing ID utilized for this connection
Session ID
If session_id was provided in the parameters,
this Routing ID is used by the ScrapeDoClient to verify that
sticky sessions are maintaining the same node
Returns:
| Type | Description |
|---|---|
Optional[str]
|
The internal routing identifier, or None if the |
cookies
property
¶
Extracts and parses cookies returned by the target server
Additional Information
If pure_cookies=True is active, it returns the httpx response's
cookies attribute. Otherwise, it decodes the custom
scrape.do-cookies string into a httpx.Cookies object
Returns:
| Type | Description |
|---|---|
Optional[Cookies]
|
A |
frames
property
¶
Extracts isolated cross-origin iframes discovered during page rendering
Prerequisites
Requires render=True, return_json=True, and show_frames=True
Returns:
| Type | Description |
|---|---|
Optional[List[ScrapeDoFrame]]
|
A list of typed Pydantic models representing frames. |
network_requests
property
¶
Intercepts background network traffic triggered by the headless browser
Prerequisites
Requires render=True and return_json=True
Returns:
| Type | Description |
|---|---|
Optional[List[ScrapeDoNetworkRequest]]
|
A list of typed models detailing HTTP calls |
websocket_requests
property
¶
Intercepts bidirectional WebSocket traffic initiated by the target website
Prerequisites
Requires render=True, return_json=True, and
show_websocket_requests=True
Returns:
| Type | Description |
|---|---|
Optional[List[ScrapeDoWebsocketRequest]]
|
A list of typed models detailing socket events |
action_results
property
¶
Details the success or failure of programmatic DOM interactions
Returns:
| Type | Description |
|---|---|
Optional[List[ScrapeDoActionResult]]
|
A list of typed models mapping sequentially to the actions defined
in the |
screenshots
property
¶
Extracts generated Base64 screenshots from the JSON payload
Prerequisites
Requires render=True, return_json=True, and a valid screenshot
parameter (e.g., full_screenshot=True)
Returns:
| Type | Description |
|---|---|
Optional[List[ScrapeDoScreenshot]]
|
A list of typed models containing the image data |
__repr__()
¶
Compact identifier for REPL inspection and log output
Not Reconstructable
-
ScrapeDoResponsewraps anhttpx.Responseand aPreparedScrapeDoRequestfrom a network exchange that can't be replayed, so a stricteval-able repr isn't realistic -
Python's docs explicitly endorse the angle-bracket shorthand for that case
Returns:
| Type | Description |
|---|---|
str
|
A short one-line string of the form
|
to_dict()
¶
Flat dict of every public field on this response
Excluded Fields
-
The underlying
httpx.Response(not JSON-serializable and recoverable viaself.httpx_response) -
The originating
PreparedScrapeDoRequest(recoverable viaself.request)
Nested Model Serialization
-
frames,network_requests,websocket_requests,action_results, andscreenshotsare serialized via each item's.model_dump() -
Empty lists are rendered as
Noneso absent sections aren't confused with empty ones
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict mapping every public property name to its current value |
to_json(**kwargs)
¶
Uses json.dumps to serialize the dictionary returned by to_dict
into a JSON string
Default Kwargs
indent=2-
ensure_ascii=False -
Override either by passing them explicitly
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Any
|
Keyword-arguments to forward to |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
|
json(raw_response=True, **kwargs)
¶
Decodes the JSON response body (if any) as a Python object
Return Value
-
When the
raw_responseparameter is set toTrue, this method acts as a shortcut forScrapeDoResponse.httpx_response.json() -
When it is set to
Falseand the response contains a Scrape.do JSON envelope, it passes thecontentkey of that envelope tojson.loads
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_response
|
bool
|
When set to |
True
|
**kwargs
|
Any
|
Additional keyword arguments to pass to
|
{}
|
Raises:
| Type | Description |
|---|---|
JSONDecodeError
|
If the response contains unparsable JSON |
Returns:
| Type | Description |
|---|---|
Any
|
Dict, list, etc. Depending on what's in the response |
raise_for_status()
¶
Evaluates the response and raises a mapped exception if the request failed
Additional Information
Utilizes the is_proxy_error heuristic to determine if
the failure originated from the Scrape.do proxy infrastructure or
from the target website
Returns:
| Type | Description |
|---|---|
Self
|
The current |
Raises:
| Type | Description |
|---|---|
TargetError
|
If the proxy succeeded, but the target website returned an error code (e.g., a 403 Cloudflare block or a 404 Not Found) |
BadRequestError
|
If the request was malformed (HTTP 400 from Scrape.do) |
AuthenticationError
|
If your Scrape.do API token is invalid (HTTP 401) |
AuthenticationThrottleError
|
If your specific token has been temporarily locked by the Scrape.do authentication server to prevent abuse (HTTP 401) |
RateLimitError
|
If you exceed your account's concurrent request limit (HTTP 429) |
ServerError
|
If the Scrape.do gateway experiences an issue (HTTP 502/510) |
APIResponseError
|
A generic fallback for unmapped Scrape.do proxy errors |