Architecture

Wholisphere is built on three layers — same idea as a CDN, applied to accessibility.

┌──────────────────────────────────────────────────────────┐
│ BUILD-TIME PIPELINE                                       │
│   • WCAG analyzer (rule-based + AI-augmented)             │
│   • Page intent extraction (DOM + screenshot → JSON)      │
│   • Remediation patch generator                           │
│   • Manual record mode + CI capture + production session  │
└──────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│ EDGE CACHE LAYER                                          │
│   • CDN-served per-URL intent + remediation bundle        │
│   • Content-hash freshness check                          │
│   • Multi-region (US, EU, APAC) for data residency        │
└──────────────────────────────────────────────────────────┘
                          │
                          ▼
┌──────────────────────────────────────────────────────────┐
│ RUNTIME CLIENT                                            │
│   • Browser extension OR embed script (one engine)        │
│   • Floating Shadow-DOM landmark widget (AAA)             │
│   • Local TTS / STT (Web Speech) — sub-50ms response      │
│   • Cloud LLM only on cache miss or hard reasoning        │
└──────────────────────────────────────────────────────────┘

Why three layers

A naive deploy would call the LLM on every user interaction. That’s $2k+/mo per customer at scale and adds 1–3 seconds of latency — unacceptable for screen-reader users who skim at 400 wpm.

By compiling the intent at build time and serving from the edge, we get:

Property	Naive (cloud-on-every-call)	Wholisphere (compiled cache)
Latency	800–2500 ms	< 50 ms
Cost / customer / mo	~$2,250	~$40
Determinism	Agent might do different thing each time	Deterministic playback
Audit trail	”Agent decided X"	"Agent did exactly the patch we shipped”

Multi-model LLM routing

Same prompt, multiple providers. Today: Gemini Flash + Claude Sonnet + Mock. Routing strategy:

fast / balanced → Gemini Flash (~30× cheaper per image; fine for 80% of calls)
accurate / vision → Claude Sonnet (better small-element + low-contrast OCR)

The router falls through providers on retryable errors (5xx, 429, 529). Adding OpenAI is a one-file change.

Coexistence with native AT

The widget is role="complementary" with aria-label="Accessibility tools". JAWS / NVDA / VoiceOver users find it via the standard landmark navigation key (D in NVDA, VO+U in VoiceOver). They can:

Ignore it entirely and continue using their native AT
Engage one specific tool (“describe this image please”)
Delegate the whole page to the agent

We never override or fake screen-reader output. We’re additive, not replacement.

Audit trail

Every agent action is logged: timestamp, URL, capability invoked, outcome, duration. Logs live in D1 (12 months hot) and R2 (5 years cold). Customers can export as JSON or stream via webhooks. Court-defensible.