Architecture
Wholisphere is built on three layers — same idea as a CDN, applied to accessibility.
┌──────────────────────────────────────────────────────────┐│ BUILD-TIME PIPELINE ││ • WCAG analyzer (rule-based + AI-augmented) ││ • Page intent extraction (DOM + screenshot → JSON) ││ • Remediation patch generator ││ • Manual record mode + CI capture + production session │└──────────────────────────────────────────────────────────┘ │ ▼┌──────────────────────────────────────────────────────────┐│ EDGE CACHE LAYER ││ • CDN-served per-URL intent + remediation bundle ││ • Content-hash freshness check ││ • Multi-region (US, EU, APAC) for data residency │└──────────────────────────────────────────────────────────┘ │ ▼┌──────────────────────────────────────────────────────────┐│ RUNTIME CLIENT ││ • Browser extension OR embed script (one engine) ││ • Floating Shadow-DOM landmark widget (AAA) ││ • Local TTS / STT (Web Speech) — sub-50ms response ││ • Cloud LLM only on cache miss or hard reasoning │└──────────────────────────────────────────────────────────┘Why three layers
A naive deploy would call the LLM on every user interaction. That’s $2k+/mo per customer at scale and adds 1–3 seconds of latency — unacceptable for screen-reader users who skim at 400 wpm.
By compiling the intent at build time and serving from the edge, we get:
| Property | Naive (cloud-on-every-call) | Wholisphere (compiled cache) |
|---|---|---|
| Latency | 800–2500 ms | < 50 ms |
| Cost / customer / mo | ~$2,250 | ~$40 |
| Determinism | Agent might do different thing each time | Deterministic playback |
| Audit trail | ”Agent decided X" | "Agent did exactly the patch we shipped” |
Multi-model LLM routing
Same prompt, multiple providers. Today: Gemini Flash + Claude Sonnet + Mock. Routing strategy:
fast/balanced→ Gemini Flash (~30× cheaper per image; fine for 80% of calls)accurate/vision→ Claude Sonnet (better small-element + low-contrast OCR)
The router falls through providers on retryable errors (5xx, 429, 529). Adding OpenAI is a one-file change.
Coexistence with native AT
The widget is role="complementary" with aria-label="Accessibility tools". JAWS / NVDA / VoiceOver users find it via the standard landmark navigation key (D in NVDA, VO+U in VoiceOver). They can:
- Ignore it entirely and continue using their native AT
- Engage one specific tool (“describe this image please”)
- Delegate the whole page to the agent
We never override or fake screen-reader output. We’re additive, not replacement.
Audit trail
Every agent action is logged: timestamp, URL, capability invoked, outcome, duration. Logs live in D1 (12 months hot) and R2 (5 years cold). Customers can export as JSON or stream via webhooks. Court-defensible.