Rate Limits & Overload Handling

Sudo enforces request rate limits to ensure fair usage and platform stability. This page describes the limits, response headers, and error codes you may encounter so you can implement appropriate backoff and retry logic.

Rate Limits

Sudo applies a general request rate of approximately 600 requests per minute (RPM) with short burst capacity. When you exceed the allowed rate, requests may be rejected.

What you’ll see when limited

HTTP status: 429 Too Many Requests
Headers: Standard rate-limiting headers are returned, including RateLimit-* headers when applicable, and a Retry-After header in some cases. Use these to determine when to retry.
Body: A brief message indicating the request was rate limited.

Example handling pattern:

# Pseudocode logic for client retries
if status == 429:
  wait = response.headers.get('Retry-After') or derive_from_rate_limit_headers()
  sleep(wait_seconds)
  retry()

Provider-level throttling

In addition to general request limits, upstream model providers (OpenAI, Anthropic, Google, xAI, etc.) may enforce their own quotas. When Sudo temporarily exhausts a provider’s budget, you may receive a response instructing you to retry later.

HTTP status: 503 Service Unavailable
Header: Retry-After: <seconds> — indicates when it is safe to retry
Body (JSON):

{
  "error": {
    "type": "upstream_rate_limited",
    "scope": "provider:<provider>",
    "detail": "Provider budget temporarily exhausted"
  }
}

Implement exponential backoff and respect Retry-After before retrying.

Overload: HTTP 503

If the platform is temporarily overloaded (e.g., too many concurrent users or requests), you may receive:

HTTP status: 503 Service Unavailable
Header: Retry-After: <seconds> — how long to wait before retrying

Recommended handling:

Read the Retry-After header.
Wait the indicated number of seconds.
Retry with exponential backoff if needed.

Best Practices

Prefer exponential backoff with jitter for retries.
Always honor Retry-After when present.
Keep requests within reasonable volumes to avoid throttling.
Consider queuing or batching non-urgent work.

Some SDK methods include built-in retry mechanics. If you implement your own, make sure to cap maximum retries and total wait time to provide a responsive user experience.

Overview

SDKs

API Reference

Rate Limits

What you’ll see when limited

Provider-level throttling

Overload: HTTP 503

Best Practices

Overview

SDKs

API Reference

Documentation Index

​Rate Limits

​What you’ll see when limited

​Provider-level throttling

​Overload: HTTP 503

​Best Practices

Rate Limits

What you’ll see when limited

Provider-level throttling

Overload: HTTP 503

Best Practices