Skip to content

Architecture

Wholisphere is built on three layers — same idea as a CDN, applied to accessibility.

┌──────────────────────────────────────────────────────────┐
│ BUILD-TIME PIPELINE │
│ • WCAG analyzer (rule-based + AI-augmented) │
│ • Page intent extraction (DOM + screenshot → JSON) │
│ • Remediation patch generator │
│ • Manual record mode + CI capture + production session │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ EDGE CACHE LAYER │
│ • CDN-served per-URL intent + remediation bundle │
│ • Content-hash freshness check │
│ • Multi-region (US, EU, APAC) for data residency │
└──────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────┐
│ RUNTIME CLIENT │
│ • Browser extension OR embed script (one engine) │
│ • Floating Shadow-DOM landmark widget (AAA) │
│ • Local TTS / STT (Web Speech) — sub-50ms response │
│ • Cloud LLM only on cache miss or hard reasoning │
└──────────────────────────────────────────────────────────┘

Why three layers

A naive deploy would call the LLM on every user interaction. That’s $2k+/mo per customer at scale and adds 1–3 seconds of latency — unacceptable for screen-reader users who skim at 400 wpm.

By compiling the intent at build time and serving from the edge, we get:

PropertyNaive (cloud-on-every-call)Wholisphere (compiled cache)
Latency800–2500 ms< 50 ms
Cost / customer / mo~$2,250~$40
DeterminismAgent might do different thing each timeDeterministic playback
Audit trail”Agent decided X""Agent did exactly the patch we shipped”

Multi-model LLM routing

Same prompt, multiple providers. Today: Gemini Flash + Claude Sonnet + Mock. Routing strategy:

  • fast / balanced → Gemini Flash (~30× cheaper per image; fine for 80% of calls)
  • accurate / vision → Claude Sonnet (better small-element + low-contrast OCR)

The router falls through providers on retryable errors (5xx, 429, 529). Adding OpenAI is a one-file change.

Coexistence with native AT

The widget is role="complementary" with aria-label="Accessibility tools". JAWS / NVDA / VoiceOver users find it via the standard landmark navigation key (D in NVDA, VO+U in VoiceOver). They can:

  • Ignore it entirely and continue using their native AT
  • Engage one specific tool (“describe this image please”)
  • Delegate the whole page to the agent

We never override or fake screen-reader output. We’re additive, not replacement.

Audit trail

Every agent action is logged: timestamp, URL, capability invoked, outcome, duration. Logs live in D1 (12 months hot) and R2 (5 years cold). Customers can export as JSON or stream via webhooks. Court-defensible.