Project

Tenant Lens

Multi-Tenant RAG · Hybrid Retrieval · Bedrock

A multi-tenant support system for DevOps and SRE teams. Hybrid full-text + vector retrieval with strict tenant isolation, JWT-bound query boundaries via Cognito, and idempotent ingestion with replacement semantics for living documents.

Read Article → Project Wiki → GitLab Repo → Live Demo →

Outcomes

At a glance

fictional tenants

JWT

tenant isolation

Hybrid

retrieval (BM25 + vector)

Stack

Built with

AWS
S3
SQS
Lambda
Step Functions
OpenSearch
Bedrock
Cognito
Claude Sonnet 4.6
Titan Embeddings V2
Terraform

Detail

Case study

The problem

Most tenant-aware retrieval demos handwave the boundary. The ones that don’t tend to bury the design in framework cruft. I wanted to build a multi-tenant RAG MVP that could demonstrate, on AWS, what tenant isolation actually costs you in architecture — and what it buys you in defensibility.

What I built

A multi-tenant support assistant for DevOps and SRE teams, deployed in a single AWS account as a working MVP.

Pipeline

S3 (drop) → SQS → starter Lambda → Step Functions → OpenSearch (per-tenant index)
                                                  ↓
                                       UI / Cognito-protected query
                                                  ↓
                            Hybrid retrieval (BM25 + vector) → Bedrock answer synthesis

Tenant boundary

One index per tenant in OpenSearch
JWT-bound query path — Cognito Hosted UI issues tokens, the query runtime extracts tenant claims and routes only to the matching index
Three fictional tenants in the corpus today: forgecraft-supply, roadkeep-fleet, northbeam-creative
Idempotent ingestion with replacement semantics for living documents (updates don’t fork history)

Models

Answer generation: Anthropic Claude Sonnet 4.6 via Bedrock inference profile global.anthropic.claude-sonnet-4-6
Embeddings: Amazon Titan Text Embeddings V2

Design decisions worth calling out

Tenant isolation as a first-class invariant, not a downstream filter. The cheapest way to prove tenant isolation is to make it impossible to query across tenants in the first place — index-per-tenant + JWT-bound routing does that.
Hybrid retrieval over pure vector, because operational support content is full of exact identifiers (incident IDs, error codes, ticket numbers) that vector search degrades on. BM25 carries the exact-match weight, vector carries the paraphrase weight.
Small provisioned OpenSearch domain for MVP cost discipline — not the target shape. Production direction is OpenSearch Serverless, or a larger deployment in a VPC on private subnets.
Markdown-only corpus, organized by document category — keeps ingestion simple while the retrieval and tenant-boundary work is the focus.

What I’d change for a production deployment

Move OpenSearch into a VPC on private subnets, fronted by a service-linked role
Move from inference profile to a tenant-scoped model gating layer (per-tenant model routing for cost/quality tiers)
Add per-tenant retrieval evaluation harness with ground-truth Q/A sets
Wire query traces into a tenant-aware observability layer so debugging stays tenant-respectful