Skip to content

Rate limits

Rate limits exist to keep the service fast and fair.

Limits per plan

PlanPer minutePer day
Free301,000
Pro30030,000
Platform3,000 (configurable)unlimited

Limits are per-org, not per-key. All API keys for an org share the same bucket.

Headers

When you exceed a limit you get HTTP 429 Too Many Requests with:

retry-after: 60
content-type: application/json
{
"error": {
"code": "rate_limited",
"message": "Too many requests in the last minute",
"traceId": "...",
"retryable": true
}
}

retry-after is in seconds. Honor it.

Strategy

Client-side:

  • Implement exponential backoff with jitter
  • Cache responses where the input is stable (e.g., describe_image of the same image)
  • Batch when the API supports it (telemetry)

Server-side:

We use a fixed-window counter per (org × minute) and (org × day) stored in Cloudflare KV. Burst capacity is the per-minute limit.

Need more

Email sales@wholisphere.ai — we’ll lift limits for legitimate Platform use cases. Free / Pro limits are intentionally tight to keep cloud LLM costs predictable.