Rate Limiting & Quotas
Protect your API with per-minute rate limits and monthly quotas.
Holdify provides two layers of usage control:
- Rate limits — Prevent burst abuse with per-minute request limits
- Quotas — Enforce plan limits with monthly request allowances
Token bucket algorithm
Holdify uses the token bucket algorithm for rate limiting. This provides smooth rate limiting while allowing short bursts of traffic.
┌─────────────────────────────────────────────────────────────┐
│ Token Bucket Algorithm │
└─────────────────────────────────────────────────────────────┘
Bucket Capacity: 60 tokens (rate limit per minute)
┌──────────────────────────────────────────────────────────┐
│ ████████████████████████████████████░░░░░░░░░░░░░░░░░░░░ │
│ ◄──────── 40 tokens remaining ────────► │
└──────────────────────────────────────────────────────────┘
• Tokens refill at a steady rate (1 per second for 60/min)
• Each request consumes 1 token (or more with 'units' param)
• When bucket is empty → 429 Rate Limit Exceeded
• Bucket never exceeds capacity (burst is limited)Why token bucket? It's more forgiving than fixed windows. A user who was idle for 30 seconds can make a small burst of requests, rather than being strictly limited to 1 request per second.
Rate limit vs quota
These serve different purposes and are enforced separately:
┌─────────────────────────────────────────────────────────────┐
│ Rate Limit vs Quota: Key Differences │
└─────────────────────────────────────────────────────────────┘
Rate Limit Quota
────────── ─────
Purpose Burst protection Usage metering
Window Per minute Per month
Resets Every 60 seconds Start of billing period
HTTP Status 429 Too Many Requests 402 Payment Required
Response Field rateLimit.remaining quota.remaining
Use Case Prevent abuse Enforce plan limitsConfiguring limits
Configure rate limits and quotas in your plan settings:
// Plan configuration in Holdify Dashboard
// Settings → Plans → Edit Plan
{
"name": "pro",
"displayName": "Pro Plan",
// Rate limiting (per minute)
"rateLimit": {
"requestsPerMinute": 60, // Max 60 requests per minute
"burstLimit": 10 // Allow burst of 10 requests
},
// Quota (per billing period)
"quota": {
"requestsPerMonth": 5000, // 5000 requests per month
"resetDay": 1 // Resets on 1st of each month
},
// Features for gating
"features": [
"model:gpt-4o",
"model:claude-sonnet",
"priority-queue"
]
}Per-key overrides
You can override plan limits for specific API keys, useful for enterprise customers:
// You can also set per-key limits that override plan defaults
// Useful for enterprise customers or special cases
POST /v1/api-keys
{
"tenantId": "enterprise_customer",
"name": "Enterprise Key",
"overrides": {
"rateLimit": {
"requestsPerMinute": 300 // Higher than plan default
},
"quota": {
"requestsPerMonth": 100000
}
}
}Response headers
Holdify includes rate limit and quota information in response headers:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests allowed per rate limit window |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Unix timestamp when window resets |
| X-Quota-Limit | Total quota for billing period |
| X-Quota-Remaining | Quota remaining for billing period |
| X-Quota-Reset | ISO 8601 timestamp when quota resets |
| Retry-After | Seconds until rate limit resets (on 429) |
Handling in code
Here's how to properly handle rate limits and quotas in your API:
// Handling rate limits and quota in your API
async function handleRequest(apiKey: string) {
const result = await holdify.verify(apiKey, {
resource: 'chat_requests',
units: 1,
});
if (!result.valid) {
return response(401, { error: 'Invalid API key' });
}
// Check rate limit (per-minute)
if (result.rateLimit.remaining <= 0) {
const retryAfter = result.rateLimit.reset - Math.floor(Date.now() / 1000);
return response(429, {
error: 'Rate limit exceeded',
retryAfter,
message: `Try again in ${retryAfter} seconds`,
}, {
'Retry-After': String(retryAfter),
'X-RateLimit-Limit': String(result.rateLimit.limit),
'X-RateLimit-Remaining': '0',
'X-RateLimit-Reset': String(result.rateLimit.reset),
});
}
// Check quota (per-month)
if (result.quota.remaining <= 0) {
return response(402, {
error: 'Monthly quota exceeded',
resetAt: result.quota.resetAt,
message: `Quota resets on ${result.quota.resetAt}`,
}, {
'X-Quota-Limit': String(result.quota.limit),
'X-Quota-Remaining': '0',
'X-Quota-Reset': result.quota.resetAt,
});
}
// Process the request...
return response(200, { success: true });
}HTTP status codes
| Status | Condition | User action |
|---|---|---|
| 429 | Rate limit exceeded | Wait and retry after Retry-After seconds |
| 402 | Monthly quota exhausted | Wait for next billing period or upgrade plan |
Best practices
- Forward headers to clients. Pass
X-RateLimit-*andX-Quota-*headers to your API consumers so they can implement client-side throttling. - Implement exponential backoff. When receiving 429, wait
for the
Retry-Afterduration before retrying. - Set appropriate limits per plan. Free tiers should have lower limits to prevent abuse. Pro/Business tiers can have higher limits.
- Use units for variable costs. If some operations cost more
(e.g., GPT-4 vs GPT-3.5), use the
unitsparameter to consume more quota.