Skip to content

Response

response

Inbound response shapes for the Scrape.do Async API

Defines the pydantic models the ScrapeDoAsyncAPIClient parses from q.scrape.do JSON responses

Plugin Content
  • TaskDetails.content is an opaque string for now

  • Per-plugin structured response models ship with the plugin clients themselves in 0.4 / 0.5

CancelJobResponse = JobDetails module-attribute

Alias for JobDetails

DELETE /api/v1/jobs/{jobID} returns the same shape as the corresponding GET with Canceled=true

WebhookPayload = TaskDetails module-attribute

Alias for TaskDetails

The Scrape.do Async API posts a TaskDetails-shaped JSON body to the configured webhook URL when each task reaches a terminal status

JobCreationResponse

Bases: BaseModel

Response body returned by POST /api/v1/jobs

Attributes:

Name Type Description
job_id str

Server-assigned UUID for the newly created job

task_ids List[str]

UUIDs for each task spawned from this job (one per Targets[] entry or one per Plugin.Params[] entry)

message Optional[str]

Human-readable acknowledgment message

UndetailedTaskResponse

Bases: BaseModel

Per-task summary nested inside JobDetails.tasks

Attributes:

Name Type Description
task_id str

UUID of the task

url str

The target URL for this task

status JobStatus

Current lifecycle status of the task

JobDetails

Bases: BaseModel

Response body returned by GET /api/v1/jobs/{jobID} and DELETE /api/v1/jobs/{jobID}

Attributes:

Name Type Description
job_id str

UUID of the job

task_ids List[str]

UUIDs of every task in this job

status JobStatus

Current lifecycle status

start_time Optional[datetime]

When the job started executing (RFC3339)

end_time Optional[datetime]

When the job reached a terminal status (RFC3339)

acquired_concurrency int

Number of concurrent requests currently in use by this job (per Scrape.do's Async API response schema)

limit_concurrency int

Maximum number of concurrent requests allowed for this job. 0 indicates no per-job ceiling beyond the account-wide async pool

canceled bool

True if the job was explicitly canceled

tasks List[UndetailedTaskResponse]

Per-task summary entries

is_terminal property

Whether the job has reached any terminal status

Terminal Statuses
  • success
  • error
  • canceled

Returns:

Type Description
bool

True if status is one of the documented terminal statuses, False otherwise

is_success property

Whether the job completed successfully

Returns:

Type Description
bool

True if status == "success", False otherwise

is_terminal_failure property

Whether the job has reached a terminal failure state

Status Routing
  • error / canceledTrue
  • successFalse
  • Non-Terminal StatusesFalse

Returns:

Type Description
bool

True if status is error or canceled, False otherwise

duration property

Wall-clock duration the job spent running

Availability
  • Both start_time and end_time are populated only once the job reaches a terminal status

  • Before then, this property returns None

Returns:

Type Description
Optional[timedelta]

end_time - start_time when both are set, None otherwise

raise_for_status()

Raises the appropriate terminal-state exception for the current status

Exception Mapping
  • status == "error" → raises JobFailedError
  • status == "canceled" → raises JobCanceledError
  • Success / Non-Terminal → no-op
Intended Usage

Raises:

Type Description
JobFailedError

When status == "error"

JobCanceledError

When status == "canceled"

TaskDetails

Bases: BaseModel

Response body returned by GET /api/v1/jobs/{jobID}/{taskID}

Attributes:

Name Type Description
task_id str

UUID of the task

job_id str

UUID of the parent job

url str

The target URL the task was fetching

status JobStatus

Terminal status (success, error, or canceled when delivered via webhook)

start_time Optional[datetime]

When the task started

end_time Optional[datetime]

When the task reached its terminal status

update_time Optional[datetime]

Last update timestamp

expires_at Optional[datetime]

When the task's content expires server-side and becomes unretrievable

base64_encoded_content bool

Whether content is base64-encoded

status_code int

HTTP status code returned by the target

response_headers Dict[str, str]

Headers returned by the target

scrape_do_metadata Dict[str, str]

Scrape.do's telemetry block. Exposed under the literal Scrape.do key on the wire

content Optional[str]

Response body. Base64-decoded via decoded_content() when base64_encoded_content=True

error_message Optional[str]

Task-level error message on failure

is_terminal property

Whether the task has reached any terminal status

Terminal Statuses
  • success
  • error
  • canceled

Returns:

Type Description
bool

True if status is one of the documented terminal statuses, False otherwise

is_success property

Whether the task completed successfully

Lifecycle Status vs Target Response
  • This checks the task's lifecycle status, not the target's HTTP Status Code

  • A task with status == "success" and status_code == 404 means Scrape.do successfully fetched the target, which itself returned 404

Returns:

Type Description
bool

True if status == "success", False otherwise

is_terminal_failure property

Whether the task has reached a terminal failure state

Status Routing
  • error / canceledTrue
  • successFalse
  • Non-Terminal StatusesFalse

Returns:

Type Description
bool

True if status is error or canceled, False otherwise

is_expired property

Whether the task's content has expired server-side

Definition
  • Returns True only when expires_at is populated and strictly before datetime.now(timezone.utc)

  • Returns False only when expires_at is populated and strictly after datetime.now(timezone.utc)

  • Returns None when expires_at is not set

Returns:

Type Description
Optional[bool]

True if expires_at is set and in the past, False if expires_at is set and in the future, None if expires_at is not set

duration property

Wall-clock duration the task spent running

Availability
  • Both start_time and end_time are populated only once the task reaches a terminal status

  • Before then, this property returns None

Returns:

Type Description
Optional[timedelta]

end_time - start_time when both are set, None otherwise

decoded_content()

Returns content as bytes, decoding base64 when applicable

Behavior
  • When base64_encoded_content=True, decodes content via base64.b64decode
  • When base64_encoded_content=False, returns content.encode("utf-8") so callers always get bytes
  • Returns None when content is None

Returns:

Type Description
Optional[bytes]

The decoded bytes, or None if there is no content

Raises:

Type Description
ValueError

If base64_encoded_content=True but content is not valid base64

raise_for_status()

Raises the appropriate terminal-state exception for the current status

Exception Mapping
  • status == "error" → raises TaskFailedError
  • status == "canceled" → raises TaskCanceledError
  • Success / Non-Terminal → no-op
Mirrors JobDetails.raise_for_status

Equivalent to JobDetails.raise_for_status on the parent job, but for the task's own lifecycle status

Raises:

Type Description
TaskFailedError

When status == "error". The task's error_message (when set) gives a more specific reason

TaskCanceledError

When status == "canceled"

UserInformation

Bases: BaseModel

Response body returned by GET /api/v1/me

AvaliableCredits
  • The credit-balance field is spelled AvaliableCredits in Scrape.do's official documentation and is the live server spelling

  • Verified against /api/v1/me and locked here as the alias

Attributes:

Name Type Description
total_concurrency int

Total concurrency limit on the account

free_concurrency int

Currently available concurrency slots

active_jobs int

Number of jobs currently running

available_credits int

Remaining account credits. Mapped from the AvaliableCredits server field

JobsListResponse

Bases: BaseModel

Response body returned by GET /api/v1/jobs

Per-Job Entries
  • Each entry in jobs is parsed as a JobDetails model

  • Unlike GET /api/v1/jobs/{jobID}, the listing payload omits the AcquiredConcurrency, LimitConcurrency, Canceled, and Tasks attributes, so those fall back to their model defaults (0, 0, False, and [], respectively)

  • The actual values can always be fetched via get_job

Attributes:

Name Type Description
jobs List[JobDetails]

Per-job entries on this page

total_count int

Total jobs matching the query (across all pages)

page_size int

Page size used to compute the response

page_number int

1-indexed page number of the current response

total_pages int

Total pages available for the query

JobResult

Bases: BaseModel

Bundle of terminal JobDetails + the fetched per-task details

Returned by ScrapeDoAsyncAPIClient.submit_and_wait after a job reaches a terminal status and every task's details have been fetched

Attributes:

Name Type Description
job JobDetails

The terminal JobDetails snapshot

tasks List[TaskDetails]

The fully-fetched TaskDetails for each task_id in job.task_ids, in input order