Cache Master is a semantic caching proxy for LLM APIs. Five-tier architecture — L1 memory, local disk, Postgres, Redis, semantic matching — intercepts identical prompts before they cost you money.
Five-layer caching cascade with automatic promotion. L1 memory hits return in microseconds. Lower tiers backfill upper layers on access.
Text embeddings + pgvector cosine similarity catch semantically equivalent prompts. Different wording, same response, zero API cost.
Leader/follower pattern collapses concurrent identical requests. 100 simultaneous calls become 1 upstream request.
OpenAI, Anthropic, Google Gemini. Single proxy URL. Automatic provider detection from request paths. SSE streaming support.
Real-time metrics on cache hit rates, cost savings, token usage. Per-provider breakdowns, tenant management, API key rotation.
Rate limiting, JWT auth, multi-tenant quotas, quality scoring. Structured logging, Prometheus metrics, Docker Compose orchestration.
# Clone repository git clone https://github.com/JustABard/cashe-master.git cd cashe-master # Configure environment cp backend/.env.example backend/.env cp frontend/.env.example frontend/.env # Start all services docker compose up --build # Services available at: # Proxy: http://localhost:8080 # Dashboard: http://localhost:3000
// Point your API calls to Cache Master const response = await fetch( 'http://localhost:8080/v1/chat/completions', { method: 'POST', headers: { 'Authorization': 'Bearer cm_your_api_key', 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'gpt-4', messages: [ { role: 'user', content: 'Hello!' } ] }) } );
MIT licensed. Deploy on your infrastructure. Full control over your caching layer. No vendor lock-in.