Rate limits

How Ray9 limits inbound traffic, what 429 responses carry, and how to retry without making things worse.

Ray9 applies a single rate-limit bucket per organization. The bucket is shared across every authenticated caller hitting the Ray9 API for your org — REST, MCP, CLI, and authenticated dashboard calls back to the API all draw from the same allowance. This is deliberate: quota is a property of your account, not a side-effect of how many API keys you happen to have created.

The default limit

Every org gets 60 requests per 60 seconds, on a fixed 60-second window. That's tight enough to catch runaway loops early and generous enough that normal interactive use of an MCP client or CLI never trips it.

Why per-org and not per-key

Per-key rate limits would let an org route around its own quota by creating more keys. Per-org keeps the budget honest. If you genuinely need a higher ceiling, the path is a plan upgrade, not more keys.

That also means: a noisy CI job using one key can starve your interactive client's calls under a different key on the same org. If that's a concern, run noisy workloads against a separate org.

Hitting the limit

When the bucket is empty, Ray9 returns:

HTTP/1.1 429 Too Many Requests
Retry-After: 7
x-request-id: req_4f3a2c1b9e8d
{
  "requestId": "req_4f3a2c1b9e8d",
  "error": {
    "code": "rate_limited",
    "message": "Too many requests. Please retry after the indicated delay.",
    "details": {
      "retryAfterMs": 6843
    }
  }
}

Two retry-timing fields:

  • Retry-After header — whole seconds, per the HTTP spec. Use this if you're plumbing through a generic HTTP client.
  • details.retryAfterMs — milliseconds, for clients that want sub-second precision. Use this when you can.

Honour whichever you're using before sending the next request. Retrying earlier is wasted work — the bucket isn't consulted on early retries, you'll just 429 again immediately.

For 429s on a real workload:

  1. Read details.retryAfterMs (preferred) or Retry-After.
  2. Wait at least that long — add jitter of ±50% so multiple clients don't synchronise their retries on the same instant.
  3. On the second consecutive 429 and after, switch to exponential backoff with a cap. We recommend min(retryAfterMs * 2^n, 30000) where n is the number of prior consecutive 429s (so n=1 on the second 429, n=2 on the third, etc.), capped at 30s.
  4. After ≥ 5 consecutive 429s, alert. Either the load is sustained above your plan's ceiling, or something's wrong on our end — get in touch at contact@ray9.ai.

The MCP server returns the same Retry-After / details.retryAfterMs to your agent and lets the agent decide. If you're hitting the REST API directly, wire the strategy above yourself.

What you don't get (yet)

There are no x-ratelimit-remaining / x-ratelimit-limit headers on success responses. Two reasons:

  • Most agents don't pace themselves against quota — they just retry on 429 and let backpressure do its job.
  • Exposing remaining-quota on every response makes it cheap for someone with a leaked key to map your usage shape. We'd rather the bucket be opaque from the outside.

If your use-case genuinely needs proactive pacing, raise the request — it's a plausible add for plans where the quota ceiling is high enough that exhaustion is rare and predictable.

Multiple clients, one org

Because the bucket is shared, watch out for:

  • CI + interactive use on the same org — a CI job hammering the API will eat into the allowance for whoever's interactively running the CLI / MCP. Use separate orgs (or at least separate keys with documented quota expectations) for high-volume background work.
  • Concurrent agents on the same key — fine in principle (they just share the bucket), but if you've spun up 10 agents that each retry on every error, you'll burn through 60 req/min in seconds. Backoff matters.

Errors

A rate-limit hit is the standard error envelope; see Errors for the full shape. The code is always rate_limited; the HTTP status is always 429.

On this page