Sudo enforces request rate limits to ensure fair usage and platform stability. This page describes the limits, response headers, and error codes you may encounter so you can implement appropriate backoff and retry logic.
Rate Limits
Sudo applies a general request rate of approximately 600 requests per minute (RPM) with short burst capacity. When you exceed the allowed rate, requests may be rejected.
What you’ll see when limited
- HTTP status:
429 Too Many Requests
- Headers: Standard rate-limiting headers are returned, including
RateLimit-* headers when applicable, and a Retry-After header in some cases. Use these to determine when to retry.
- Body: A brief message indicating the request was rate limited.
Example handling pattern:
# Pseudocode logic for client retries
if status == 429:
wait = response.headers.get('Retry-After') or derive_from_rate_limit_headers()
sleep(wait_seconds)
retry()
Provider-level throttling
In addition to general request limits, upstream model providers (OpenAI, Anthropic, Google, xAI, etc.) may enforce their own quotas. When Sudo temporarily exhausts a provider’s budget, you may receive a response instructing you to retry later.
- HTTP status:
503 Service Unavailable
- Header:
Retry-After: <seconds> — indicates when it is safe to retry
- Body (JSON):
{
"error": {
"type": "upstream_rate_limited",
"scope": "provider:<provider>",
"detail": "Provider budget temporarily exhausted"
}
}
Implement exponential backoff and respect Retry-After before retrying.
Overload: HTTP 503
If the platform is temporarily overloaded (e.g., too many concurrent users or requests), you may receive:
- HTTP status:
503 Service Unavailable
- Header:
Retry-After: <seconds> — how long to wait before retrying
Recommended handling:
- Read the
Retry-After header.
- Wait the indicated number of seconds.
- Retry with exponential backoff if needed.
Best Practices
- Prefer exponential backoff with jitter for retries.
- Always honor
Retry-After when present.
- Keep requests within reasonable volumes to avoid throttling.
- Consider queuing or batching non-urgent work.
Some SDK methods include built-in retry mechanics. If you implement your own, make sure to cap maximum retries and total wait time to provide a responsive user experience.