Leashly sits between your app and any LLM. Enforce spend caps, rate limits, injection protection — and actively cut your bill with smart routing and semantic caching.
No credit card · 5 minute setup · OpenAI · Anthropic · Gemini
avg. monthly savings for Pro users
| 1 | // Before |
| 2 | const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY }) |
| 3 | |
| 4 | // After — one line change. That's it. |
| 5 | const openai = new OpenAI({ |
| 6 | apiKey: "lsh_xxxxxxxxxxxx", |
| 7 | baseURL: "https://leashly.dev/api/proxy" |
| 8 | }) |
The problem
There are no guardrails between your app and the LLM API. One misconfigured feature, one abusive user, or one runaway script — and your next invoice is unrecognizable.
No rate limits means no friction for abuse. No spend caps means no ceiling on damage. No attribution means no idea who caused it.
How it works
The same interface your SDK already uses. Zero refactoring.
Spend caps per user, per day, per model. Rate limits that actually work. Injection filter that stops attacks before they reach the model.
Every token, dollar, and request attributed to the exact user, feature, and model. No more mystery invoices.
Smart routing, semantic caching, and prompt compression cut your AI bill without any code changes. Average savings: 60%.
Features
Daily, weekly, monthly limits. Block or alert when thresholds are hit.
Per-minute, per-hour throttling. Per account, key, or IP.
Blocks 50+ jailbreak patterns. Three sensitivity levels.
Auto-routes simple requests to cheaper models. Average 40% savings.
Similar prompts return cached responses at $0 cost. pgvector powered.
Shrinks bloated system prompts before they reach the model.
See which user and feature is burning money. Full model breakdown.
Email + in-app notifications when spend or rate limits are hit.
Every request logged with tokens, cost, duration, and flag reason.
Integration
One line change. Drop-in compatible.
| 1 | import OpenAI from 'openai'; |
| 2 | |
| 3 | const client = new OpenAI({ |
| 4 | apiKey: process.env.LEASHLY_KEY, |
| 5 | baseURL: 'https://leashly.dev/api/proxy', |
| 6 | }); |
| 7 | |
| 8 | const response = await client.chat.completions.create({ |
| 9 | model: 'gpt-4o', |
| 10 | messages: [{ role: 'user', content: 'Hello!' }], |
| 11 | }); |
Pricing
Saves itself in the first week.
Cancel anytime
Need more? Contact us
No. The proxy runs in the same region as your LLM provider. Typical overhead is under 5ms.
Yes. Keys are encrypted at rest with AES-256-CBC. We never log or expose them in any response.
Yes. Leashly fully supports server-sent events (SSE) streaming responses, passing them through transparently.
OpenAI, Anthropic, Google Gemini, and any OpenAI-compatible endpoint. Add custom endpoints in the dashboard.
Leashly returns a 429 with a clean JSON error. Your app gets a structured error to handle gracefully.
Yes. Leashly is open-source. Deploy on Vercel, Railway, or any Node.js host in minutes.
Free forever for indie devs. No credit card required.
Create your free account →