Rate limits
How Ray9 limits inbound traffic, what 429 responses carry, and how to retry without making things worse.
Ray9 applies a single rate-limit bucket per organization. The bucket is shared across every authenticated caller hitting the Ray9 API for your org — REST, MCP, CLI, and authenticated dashboard calls back to the API all draw from the same allowance. This is deliberate: quota is a property of your account, not a side-effect of how many API keys you happen to have created.
The default limit
Every org gets 60 requests per 60 seconds, on a fixed 60-second window. That's tight enough to catch runaway loops early and generous enough that normal interactive use of an MCP client or CLI never trips it.
Why per-org and not per-key
Per-key rate limits would let an org route around its own quota by creating more keys. Per-org keeps the budget honest. If you genuinely need a higher ceiling, the path is a plan upgrade, not more keys.
That also means: a noisy CI job using one key can starve your interactive client's calls under a different key on the same org. If that's a concern, run noisy workloads against a separate org.
Hitting the limit
When the bucket is empty, Ray9 returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 7
x-request-id: req_4f3a2c1b9e8d{
"requestId": "req_4f3a2c1b9e8d",
"error": {
"code": "rate_limited",
"message": "Too many requests. Please retry after the indicated delay.",
"details": {
"retryAfterMs": 6843
}
}
}Two retry-timing fields:
Retry-Afterheader — whole seconds, per the HTTP spec. Use this if you're plumbing through a generic HTTP client.details.retryAfterMs— milliseconds, for clients that want sub-second precision. Use this when you can.
Honour whichever you're using before sending the next request. Retrying earlier is wasted work — the bucket isn't consulted on early retries, you'll just 429 again immediately.
Recommended retry strategy
For 429s on a real workload:
- Read
details.retryAfterMs(preferred) orRetry-After. - Wait at least that long — add jitter of ±50% so multiple clients don't synchronise their retries on the same instant.
- On the second consecutive 429 and after, switch to exponential backoff with a cap. We recommend
min(retryAfterMs * 2^n, 30000)wherenis the number of prior consecutive 429s (son=1on the second 429,n=2on the third, etc.), capped at 30s. - After ≥ 5 consecutive 429s, alert. Either the load is sustained above your plan's ceiling, or something's wrong on our end — get in touch at
contact@ray9.ai.
The MCP server returns the same Retry-After / details.retryAfterMs to your agent and lets the agent decide. If you're hitting the REST API directly, wire the strategy above yourself.
What you don't get (yet)
There are no x-ratelimit-remaining / x-ratelimit-limit headers on success responses. Two reasons:
- Most agents don't pace themselves against quota — they just retry on 429 and let backpressure do its job.
- Exposing remaining-quota on every response makes it cheap for someone with a leaked key to map your usage shape. We'd rather the bucket be opaque from the outside.
If your use-case genuinely needs proactive pacing, raise the request — it's a plausible add for plans where the quota ceiling is high enough that exhaustion is rare and predictable.
Multiple clients, one org
Because the bucket is shared, watch out for:
- CI + interactive use on the same org — a CI job hammering the API will eat into the allowance for whoever's interactively running the CLI / MCP. Use separate orgs (or at least separate keys with documented quota expectations) for high-volume background work.
- Concurrent agents on the same key — fine in principle (they just share the bucket), but if you've spun up 10 agents that each retry on every error, you'll burn through 60 req/min in seconds. Backoff matters.
Errors
A rate-limit hit is the standard error envelope; see Errors for the full shape. The code is always rate_limited; the HTTP status is always 429.