What is an AI cost control proxy?

An AI cost control proxy sits between your application and LLM providers like OpenAI or Anthropic. It enforces spend caps, rate limits, and security filters on every request, and can actively reduce costs through smart routing and semantic caching.

How do I reduce my OpenAI API costs?

Leashly reduces OpenAI costs through smart routing (automatically using cheaper models for simple tasks), semantic caching (returning cached responses for similar prompts at zero cost), and prompt compression (shrinking system prompts before they reach the model). Average savings are 60%.

How do I add rate limiting to my OpenAI API calls?

With Leashly, add one line of code — change your OpenAI base URL to https://www.leashly.dev/api/proxy and your API key to your Leashly proxy key. Then configure rate limiting rules in the dashboard with no additional code changes.

What is prompt injection and how do I prevent it?

Prompt injection is an attack where malicious users insert instructions into prompts to hijack AI behavior. Leashly scans every request against 50+ known attack patterns and blocks them before they reach the model.

Does Leashly work with Anthropic Claude?

Yes. Leashly supports OpenAI, Anthropic Claude, Google Gemini, and any OpenAI-compatible endpoint.

How much latency does an LLM proxy add?

Leashly adds less than 5ms of overhead in typical operation. Rule evaluation happens in-memory with no additional database round-trips on the hot path.

docsIntroduction

Introduction

Leashly is an AI cost control proxy that sits between your application and any LLM provider. It enforces spend caps, rate limits, and prompt injection protection without requiring any changes to your application code beyond a single environment variable.

What does Leashly do?

When your app makes a request to an LLM like GPT-4 or Claude, that request goes through Leashly first. Leashly checks your configured rules — spend caps, rate limits, injection filters — and either forwards the request to the provider or blocks it with a clean error response.

Cost control

Daily, weekly, monthly spend caps per key or account

Rate limiting

Per-minute, per-hour request throttling

Injection protection

Blocks 50+ known jailbreak and extraction patterns

Key concepts

Proxy key — A Leashly-issued key (prefixed lsh_) that your app uses instead of your real provider API key. Leashly maps this to your real key server-side.

Rules — Configurable policies applied to every proxied request. Three types: spend caps, rate limits, and injection filters.

Alerts — Email notifications triggered when a rule threshold is reached.

Request log — A record of every proxied request including tokens, cost, duration, model, and provider.

✦

Your real API keys are never exposed to your frontend or logged anywhere. They are stored encrypted with AES-256 and only decrypted inside the proxy at request time.

Home Get started