Docs Concepts Rate Limiting

Rate Limiting & Quotas

Protect your API with per-minute rate limits and monthly quotas.

Holdify provides two layers of usage control:

  • Rate limits — Prevent burst abuse with per-minute request limits
  • Quotas — Enforce plan limits with monthly request allowances

Token bucket algorithm

Holdify uses the token bucket algorithm for rate limiting. This provides smooth rate limiting while allowing short bursts of traffic.

How it works
text
┌─────────────────────────────────────────────────────────────┐
│                    Token Bucket Algorithm                    │
└─────────────────────────────────────────────────────────────┘

  Bucket Capacity: 60 tokens (rate limit per minute)

  ┌──────────────────────────────────────────────────────────┐
  │ ████████████████████████████████████░░░░░░░░░░░░░░░░░░░░ │
  │ ◄──────── 40 tokens remaining ────────►                  │
  └──────────────────────────────────────────────────────────┘

  • Tokens refill at a steady rate (1 per second for 60/min)
  • Each request consumes 1 token (or more with 'units' param)
  • When bucket is empty → 429 Rate Limit Exceeded
  • Bucket never exceeds capacity (burst is limited)

Why token bucket? It's more forgiving than fixed windows. A user who was idle for 30 seconds can make a small burst of requests, rather than being strictly limited to 1 request per second.

Rate limit vs quota

These serve different purposes and are enforced separately:

Comparison
text
┌─────────────────────────────────────────────────────────────┐
│           Rate Limit vs Quota: Key Differences               │
└─────────────────────────────────────────────────────────────┘

                    Rate Limit              Quota
                    ──────────              ─────
  Purpose           Burst protection        Usage metering
  Window            Per minute              Per month
  Resets            Every 60 seconds        Start of billing period
  HTTP Status       429 Too Many Requests   402 Payment Required
  Response Field    rateLimit.remaining     quota.remaining
  Use Case          Prevent abuse           Enforce plan limits

Configuring limits

Configure rate limits and quotas in your plan settings:

Plan configuration
json
// Plan configuration in Holdify Dashboard
// Settings → Plans → Edit Plan

{
  "name": "pro",
  "displayName": "Pro Plan",

  // Rate limiting (per minute)
  "rateLimit": {
    "requestsPerMinute": 60,    // Max 60 requests per minute
    "burstLimit": 10            // Allow burst of 10 requests
  },

  // Quota (per billing period)
  "quota": {
    "requestsPerMonth": 5000,   // 5000 requests per month
    "resetDay": 1               // Resets on 1st of each month
  },

  // Features for gating
  "features": [
    "model:gpt-4o",
    "model:claude-sonnet",
    "priority-queue"
  ]
}

Per-key overrides

You can override plan limits for specific API keys, useful for enterprise customers:

Per-key limits
json
// You can also set per-key limits that override plan defaults
// Useful for enterprise customers or special cases

POST /v1/api-keys
{
  "tenantId": "enterprise_customer",
  "name": "Enterprise Key",
  "overrides": {
    "rateLimit": {
      "requestsPerMinute": 300  // Higher than plan default
    },
    "quota": {
      "requestsPerMonth": 100000
    }
  }
}

Response headers

Holdify includes rate limit and quota information in response headers:

HeaderDescription
X-RateLimit-LimitMaximum requests allowed per rate limit window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when window resets
X-Quota-LimitTotal quota for billing period
X-Quota-RemainingQuota remaining for billing period
X-Quota-ResetISO 8601 timestamp when quota resets
Retry-AfterSeconds until rate limit resets (on 429)

Handling in code

Here's how to properly handle rate limits and quotas in your API:

Handling limits
typescript
// Handling rate limits and quota in your API
async function handleRequest(apiKey: string) {
  const result = await holdify.verify(apiKey, {
    resource: 'chat_requests',
    units: 1,
  });

  if (!result.valid) {
    return response(401, { error: 'Invalid API key' });
  }

  // Check rate limit (per-minute)
  if (result.rateLimit.remaining <= 0) {
    const retryAfter = result.rateLimit.reset - Math.floor(Date.now() / 1000);
    return response(429, {
      error: 'Rate limit exceeded',
      retryAfter,
      message: `Try again in ${retryAfter} seconds`,
    }, {
      'Retry-After': String(retryAfter),
      'X-RateLimit-Limit': String(result.rateLimit.limit),
      'X-RateLimit-Remaining': '0',
      'X-RateLimit-Reset': String(result.rateLimit.reset),
    });
  }

  // Check quota (per-month)
  if (result.quota.remaining <= 0) {
    return response(402, {
      error: 'Monthly quota exceeded',
      resetAt: result.quota.resetAt,
      message: `Quota resets on ${result.quota.resetAt}`,
    }, {
      'X-Quota-Limit': String(result.quota.limit),
      'X-Quota-Remaining': '0',
      'X-Quota-Reset': result.quota.resetAt,
    });
  }

  // Process the request...
  return response(200, { success: true });
}

HTTP status codes

StatusConditionUser action
429Rate limit exceededWait and retry after Retry-After seconds
402Monthly quota exhaustedWait for next billing period or upgrade plan

Best practices

  • Forward headers to clients. Pass X-RateLimit-* and X-Quota-* headers to your API consumers so they can implement client-side throttling.
  • Implement exponential backoff. When receiving 429, wait for the Retry-After duration before retrying.
  • Set appropriate limits per plan. Free tiers should have lower limits to prevent abuse. Pro/Business tiers can have higher limits.
  • Use units for variable costs. If some operations cost more (e.g., GPT-4 vs GPT-3.5), use the units parameter to consume more quota.