Skip to content

Changelog

Pre-1.0 Disclaimer

scrape-do-python follows Semantic Versioning, but 0.x minor versions may contain breaking changes

Format

The format below is based on Keep a Changelog


0.3.1 — 2026-05-26

Added

Changed

  • GoogleSearchAiModeParameters is now a strict subset of SERP. The model no longer carries start, cr, lr, time_period, filter, nfpr, or num; AI Mode is documented as a standalone endpoint whose engine rejects those fields with 400. Breaking change for callers that constructed the model with any of them.

  • GoogleSearchParameters.num removed — the current per-endpoint SERP docs no longer list it. Breaking change for callers that relied on it being a typed attribute.

  • GoogleTrendsParameters gains tz (timezone offset minutes from UTC, default 420 server-side) and region (geographic resolution for GEO_MAP / GEO_MAP_0 widgets).

  • GoogleTrendingParameters rewritten with a typed schemageo is now required and the previously-permissive extra="allow" shell is replaced with explicit hl / hours / cat / sort / status fields backed by literal enums. The "sync-only" warning is dropped — the endpoint is now part of the Async API plugin table.

  • GoogleHotelsDetailParameters sync-only marker dropped — Scrape.do promoted the endpoint to the Async API. Field shape is unchanged.

  • WalmartStoreParameters / LowesStoreParameters now enforce the documented schema from Scrape.do's async-api/plugins page instead of accepting arbitrary extras via extra="allow". Walmart requires url (walmart.com domain) and treats zipcode + storeid as a conditional pair (both or neither). Lowes requires url (lowes.com domain) plus digit-only zipcode and storeid. Both pick up the gateway-side disableretry / transparentresponse / timeout knobs. Breaking change for callers passing undocumented extras through the previous schema-free passthrough.

Internal

  • Integration suite standardized around three test categories — content-dependent tests retry on transient Scrape.do gateway failures, shape-dependent tests assert only that the request wasn't rejected (HTTP 400), and error-routing tests are unchanged.

  • Re-introduced google/trends and lowes/store into the live plugin sweep now that the pass criterion tolerates upstream / engine-side transient failures.

  • Plugin integration tests extended with the new endpointsgoogle/youtube, chatgpt/chat, shein, trip/search, google/play-store, google/shopping/product, google/trending, plus the promoted google/hotels/detail adapter all participate in the parametrized sweep. The shared case list was moved to tests/integration/async_api/conftest.py::_plugin_cases so both test_client.py and test_async_client.py consume the same definitions instead of duplicating them.

  • New unit-test coverage for every new model — happy-path construction + cross-field rule tests for GoogleYouTubeParameters, ChatGPTChatParameters, SheinParameters, TripSearchParameters, TripDetailParameters, the Play Store family, and the Shopping Product family. The shared AsyncPlugin discriminated-union test parametrizes all 13 new adapter keys, and test_google.py / test_chatgpt.py cover each new *AsyncPlugin adapter's default key literal + min_length=1 enforcement.

  • The GoogleSearchAiOverviewAsyncPlugin integration test is omitted for now because the gateway hasn't updated to reflect the documentation changes about the google/search/ai-overview async endpoint, so requests still require the sync-only session_key parameter

0.3.1

0.3.0 — 2026-05-24

Added

Changed

Fixed

  • ScrapeDoFrame.url / ScrapeDoNetworkRequest.url relaxed from HttpUrl to str. Real-world iframes and network requests produce technically-valid but quirky URLs (e.g., ?feature=oembed?wmode=transparent) that pydantic-core's URL parser rejected, which blew up the whole response parse.

  • ScrapeDoResponse.cookies regex no longer captures structural whitespace after ; separators. Second-and-later cookie names previously came back with a phantom leading space.

  • ScrapeDoResponse constructor no longer crashes with JSONDecodeError when Scrape.do returns HTML instead of JSON under returnJSON=true — the failure is now properly routed through is_proxy_error.

  • RequestParameters.to_proxy_url now double-encodes the param string so values with URL-reserved characters (notably the JSON-string playWithBrowser payload) survive httpx's transparent decode of the proxy password during Basic auth header construction.

  • Python 3.9 / 3.10 compatibility restored. Source files importing Self / Unpack / TypeAlias from typing (only available in 3.11+ / 3.10+) now use typing_extensions. Previously the package raised ImportError at import time on 3.9 / 3.10 despite the trove classifiers claiming support.

Internal

  • New scrape_do.async_api and scrape_do.plugins sub-package layout. Async-API helpers (_raise_for_status, _parse_response, _build_job_creation_request) live as module-level functions in scrape_do.async_api.client and are shared by both client classes.

  • New unit tests for scrape_do.async_api and models/response.py.

  • Integration coverage expanded from 22 → ~120 tests across the Sync API, Proxy Mode, and Async API surfaces. The new tests/integration/async_api/ suite exercises every endpoint, both client classes, polling helpers, event hooks, the render envelope, a live PlayWithBrowser action sequence, the typed-exception hierarchy, and 12 of the 15 *AsyncPlugin variants. The remaining three (google/trends, walmart/store, lowes/store) are unit-only; they hit upstream- or engine-side failures regardless of input.

  • Integration logging pipeline formalized around pytest.hookimpl-decorated setup / makereport / teardown hooks with per-test tokens stashed on item.stash; _validate_and_log_error_state consolidated into a response_trace fixture.

  • Unit test fixtures consolidated; new shared tests/unit/async_api/conftest.py for the Async-API unit suite plus tests/integration/async_api/conftest.py exposing live client fixtures, a tight fast_polling_strategy, best-effort cancel helpers, and a type-dispatched async_api_response_trace.

  • CI matrix expanded to Python 3.9 / 3.10 / 3.11 / 3.12 / 3.13 (fail-fast: false); lint job (ruff + mypy) split out and pinned to 3.13.

0.3.0

0.2.0 — 2026-05-12

Added

  • ScrapeDoProxyClient and AsyncScrapeDoProxyClient — route requests through Scrape.do's Proxy Mode (proxy.scrape.do:8080). Same request/response surface as the API-mode clients (execute / request / get / post), minus execute_from_url (no equivalent in proxy mode). The async variant is backed by httpx.AsyncClient and uses asyncio.sleep for retry pauses.

  • Per-(api_token, parameters) httpx.Client / httpx.AsyncClient pool with bounded LRU eviction (max_pooled_clients=16 default, configurable). Two requests with the same parameters reuse the same TCP / TLS / HTTP-2 connection; the cookie jar on each pooled client is cleared after every request (Scrape.do owns the cookie lifecycle via setCookies / scrape.do-cookies / sessionId, so pooling is purely a transport concern).

  • PreparedScrapeDoRequest.to_proxy_httpx_kwargs — serializes the same data model into httpx kwargs that target the destination URL directly (the API token and Scrape.do parameters live in the proxy URL's userinfo segment, not the request).

  • RequestParameters.to_proxy_url — generates a Scrape.do Proxy-Mode connection string template (http://{api_token}:<params>@proxy.scrape.do:8080) for use with the proxy clients or with third-party tooling (Playwright / Selenium / curl).

  • RequestParameters.validate_proxy_params — cross-validates Proxy-Mode-specific parameter quirks (customHeaders defaulting to true server-side, setCookies interaction, render-mode discouragement).

  • SCRAPE_DO_CA_PATH and DEFAULT_PROXY_SSL_CONTEXT in scrape_do.constants — the bundled Scrape.do CA cert and an ssl.SSLContext preloaded with system CAs plus the bundled CA. Default verify source for the proxy-mode clients so HTTPS targets validate correctly through Scrape.do's MITM step without disabling TLS verification. VERIFY_X509_STRICT is cleared so chain validation accepts Scrape.do's self-signed root (which omits the optional AKI extension); all other verification checks remain intact.

  • Scrape.do's CA certificate bundled with the wheel under scrape_do.data so the SDK ships everything needed for proxy-mode TLS verification.

  • Public re-exports for ScrapeDoProxyClient and AsyncScrapeDoProxyClient in scrape_do/__init__.py.

  • AsyncScrapeDoClient backed by httpx.AsyncClient. Near-1:1 of the synchronous client (smart routing, retry strategy, session validation, event hooks), with every IO-bound method async/await. Sleeps between retries use asyncio.sleep rather than time.sleep.

  • AsyncClientEventHooks TypedDict and AsyncSessionValidator type alias. Both are async-only — hooks return Awaitable[None] and validators return Awaitable[bool], so they can perform I/O while the request executes.

  • Public re-exports for AsyncScrapeDoClient, AsyncClientEventHooks, and AsyncSessionValidator in scrape_do/__init__.py.

  • ScrapeDoResponse.json(raw_response=True, **kwargs) convenience method. With raw_response=True (default) it shortcuts to httpx_response.json(); with raw_response=False it returns json.loads(self.text, **kwargs) so the post-envelope path is reachable without manual parsing.

  • Example block in the package-level docstring at src/scrape_do/__init__.py showcasing a typical request flow.

Fixed

  • ScrapeDoClient.post() now forwards the session_validator argument to request(). Previously the argument was accepted but silently ignored on POST calls. get() was unaffected.

0.2.0

0.1.1 — 2026-05-09

Added

  • Curated public re-exports in scrape_do/__init__.py so common imports work as from scrape_do import ScrapeDoClient, RequestParameters, ... rather than digging into submodules.
  • py.typed PEP 561 marker so downstream type-checkers (mypy, pyright) consume the package's type hints.
  • Trove classifiers in package metadata — PyPI's "Python" sidebar and shields.io's pypi/pyversions badge now populate correctly.

Removed

  • Empty scrape_do/namespaces/ placeholder folder (was scaffolding from before the roadmap solidified; will be replaced by plugins/ in 0.4+).

Documentation

  • Planned package layout added to Roadmap

0.1.1

0.1.0 — 2026-05-08

Initial release. Synchronous client surface.

Added

0.1.0