Multi-Tier Caching Architecture

Stop Paying
Twice for the
Same Response

Cache Master is a semantic caching proxy for LLM APIs. Five-tier architecture — L1 memory, local disk, Postgres, Redis, semantic matching — intercepts identical prompts before they cost you money.

Get Started View Source →

L1 Memory

~0.01ms

In-memory
LRU cache

L2 Postgres

~1-5ms

Persistent
cache entries

L3 Redis

~0.1ms

Distributed
cache layer

Semantic

~5ms

pgvector
similarity

System Capabilities

Built for production workloads at scale

Multi-Tier Hierarchy

Five-layer caching cascade with automatic promotion. L1 memory hits return in microseconds. Lower tiers backfill upper layers on access.

Semantic Matching

Text embeddings + pgvector cosine similarity catch semantically equivalent prompts. Different wording, same response, zero API cost.

Request Deduplication

Leader/follower pattern collapses concurrent identical requests. 100 simultaneous calls become 1 upstream request.

Multi-Provider Support

OpenAI, Anthropic, Google Gemini. Single proxy URL. Automatic provider detection from request paths. SSE streaming support.

Analytics Dashboard

Real-time metrics on cache hit rates, cost savings, token usage. Per-provider breakdowns, tenant management, API key rotation.

Production Infrastructure

Rate limiting, JWT auth, multi-tenant quotas, quality scoring. Structured logging, Prometheus metrics, Docker Compose orchestration.

Quickstart

Running in 60 seconds

            # Clone repository
git clone https://github.com/JustABard/cashe-master.git
cd cashe-master

# Configure environment
cp backend/.env.example backend/.env
cp frontend/.env.example frontend/.env

# Start all services
docker compose up --build

# Services available at:
#   Proxy:     http://localhost:8080
#   Dashboard: http://localhost:3000
          

            // Point your API calls to Cache Master
const response = await fetch(
  'http://localhost:8080/v1/chat/completions',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer cm_your_api_key',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'gpt-4',
      messages: [
        { role: 'user', content: 'Hello!' }
      ]
    })
  }
);
          

Stop Paying
Twice for the
Same Response

Built for production workloads at scale

Multi-Tier Hierarchy

Semantic Matching

Request Deduplication

Multi-Provider Support

Analytics Dashboard

Production Infrastructure

Your App → Cache Hierarchy → LLM API

Running in 60 seconds

Open source.
Self-hosted.
Cost-effective.

Stop Paying Twice for the Same Response

Built for production workloads at scale

Multi-Tier Hierarchy

Semantic Matching

Request Deduplication

Multi-Provider Support

Analytics Dashboard

Production Infrastructure

Your App → Cache Hierarchy → LLM API

Running in 60 seconds

Open source. Self-hosted. Cost-effective.

Stop Paying
Twice for the
Same Response

Open source.
Self-hosted.
Cost-effective.