Rate limits
Rate limits exist to keep the service fast and fair.
Limits per plan
| Plan | Per minute | Per day |
|---|---|---|
| Free | 30 | 1,000 |
| Pro | 300 | 30,000 |
| Platform | 3,000 (configurable) | unlimited |
Limits are per-org, not per-key. All API keys for an org share the same bucket.
Headers
When you exceed a limit you get HTTP 429 Too Many Requests with:
retry-after: 60content-type: application/json
{ "error": { "code": "rate_limited", "message": "Too many requests in the last minute", "traceId": "...", "retryable": true }}retry-after is in seconds. Honor it.
Strategy
Client-side:
- Implement exponential backoff with jitter
- Cache responses where the input is stable (e.g.,
describe_imageof the same image) - Batch when the API supports it (telemetry)
Server-side:
We use a fixed-window counter per (org × minute) and (org × day) stored in Cloudflare KV. Burst capacity is the per-minute limit.
Need more
Email sales@wholisphere.ai — we’ll lift limits for legitimate Platform use cases. Free / Pro limits are intentionally tight to keep cloud LLM costs predictable.