Project
Tenant Lens
Multi-Tenant RAG · Hybrid Retrieval · Bedrock
A multi-tenant support system for DevOps and SRE teams. Hybrid full-text + vector retrieval with strict tenant isolation, JWT-bound query boundaries via Cognito, and idempotent ingestion with replacement semantics for living documents.
Outcomes
At a glance
3
fictional tenants
JWT
tenant isolation
Hybrid
retrieval (BM25 + vector)
Stack
Built with
- AWS
- S3
- SQS
- Lambda
- Step Functions
- OpenSearch
- Bedrock
- Cognito
- Claude Sonnet 4.6
- Titan Embeddings V2
- Terraform
Detail
Case study
The problem
Most tenant-aware retrieval demos handwave the boundary. The ones that don’t tend to bury the design in framework cruft. I wanted to build a multi-tenant RAG MVP that could demonstrate, on AWS, what tenant isolation actually costs you in architecture — and what it buys you in defensibility.
What I built
A multi-tenant support assistant for DevOps and SRE teams, deployed in a single AWS account as a working MVP.
Pipeline
S3 (drop) → SQS → starter Lambda → Step Functions → OpenSearch (per-tenant index)
↓
UI / Cognito-protected query
↓
Hybrid retrieval (BM25 + vector) → Bedrock answer synthesis
Tenant boundary
- One index per tenant in OpenSearch
- JWT-bound query path — Cognito Hosted UI issues tokens, the query runtime extracts tenant claims and routes only to the matching index
- Three fictional tenants in the corpus today:
forgecraft-supply,roadkeep-fleet,northbeam-creative - Idempotent ingestion with replacement semantics for living documents (updates don’t fork history)
Models
- Answer generation: Anthropic Claude Sonnet 4.6 via Bedrock inference profile
global.anthropic.claude-sonnet-4-6 - Embeddings: Amazon Titan Text Embeddings V2
Design decisions worth calling out
- Tenant isolation as a first-class invariant, not a downstream filter. The cheapest way to prove tenant isolation is to make it impossible to query across tenants in the first place — index-per-tenant + JWT-bound routing does that.
- Hybrid retrieval over pure vector, because operational support content is full of exact identifiers (incident IDs, error codes, ticket numbers) that vector search degrades on. BM25 carries the exact-match weight, vector carries the paraphrase weight.
- Small provisioned OpenSearch domain for MVP cost discipline — not the target shape. Production direction is OpenSearch Serverless, or a larger deployment in a VPC on private subnets.
- Markdown-only corpus, organized by document category — keeps ingestion simple while the retrieval and tenant-boundary work is the focus.
What I’d change for a production deployment
- Move OpenSearch into a VPC on private subnets, fronted by a service-linked role
- Move from inference profile to a tenant-scoped model gating layer (per-tenant model routing for cost/quality tiers)
- Add per-tenant retrieval evaluation harness with ground-truth Q/A sets
- Wire query traces into a tenant-aware observability layer so debugging stays tenant-respectful