Multi-Tier Caching Architecture

Stop Paying
Twice for the
Same Response

Cache Master is a semantic caching proxy for LLM APIs. Five-tier architecture — L1 memory, local disk, Postgres, Redis, semantic matching — intercepts identical prompts before they cost you money.

L1 Memory
~0.01ms
In-memory
LRU cache
L2 Postgres
~1-5ms
Persistent
cache entries
L3 Redis
~0.1ms
Distributed
cache layer
Semantic
~5ms
pgvector
similarity

Built for production workloads at scale

01

Multi-Tier Hierarchy

Five-layer caching cascade with automatic promotion. L1 memory hits return in microseconds. Lower tiers backfill upper layers on access.

02

Semantic Matching

Text embeddings + pgvector cosine similarity catch semantically equivalent prompts. Different wording, same response, zero API cost.

03

Request Deduplication

Leader/follower pattern collapses concurrent identical requests. 100 simultaneous calls become 1 upstream request.

04

Multi-Provider Support

OpenAI, Anthropic, Google Gemini. Single proxy URL. Automatic provider detection from request paths. SSE streaming support.

05

Analytics Dashboard

Real-time metrics on cache hit rates, cost savings, token usage. Per-provider breakdowns, tenant management, API key rotation.

06

Production Infrastructure

Rate limiting, JWT auth, multi-tenant quotas, quality scoring. Structured logging, Prometheus metrics, Docker Compose orchestration.

Your App → Cache Hierarchy → LLM API

Origin
Your App
Request
Tier 1
L1 Memory
~0.01ms
Tier 2
Postgres
~1-5ms
Tier 3
Redis
~0.1ms
Tier 4
Semantic
~5ms
Upstream
LLM API
~1-5s

Running in 60 seconds

# Clone repository
git clone https://github.com/JustABard/cashe-master.git
cd cashe-master

# Configure environment
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env

# Start all services
docker compose up --build

# Services available at:
#   Proxy:     http://localhost:8080
#   Dashboard: http://localhost:3000
// Point your API calls to Cache Master
const response = await fetch(
  'http://localhost:8080/v1/chat/completions',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer cm_your_api_key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: [
        { role: 'user', content: 'Hello!' }
      ]
    })
  }
);

Open source.
Self-hosted.
Cost-effective.

MIT licensed. Deploy on your infrastructure. Full control over your caching layer. No vendor lock-in.