sema.cloud
Project

Tenant Lens

Multi-Tenant RAG · Hybrid Retrieval · Bedrock

A multi-tenant support system for DevOps and SRE teams. Hybrid full-text + vector retrieval with strict tenant isolation, JWT-bound query boundaries via Cognito, and idempotent ingestion with replacement semantics for living documents.

Outcomes

At a glance

3
fictional tenants
JWT
tenant isolation
Hybrid
retrieval (BM25 + vector)
Stack

Built with

  • AWS
  • S3
  • SQS
  • Lambda
  • Step Functions
  • OpenSearch
  • Bedrock
  • Cognito
  • Claude Sonnet 4.6
  • Titan Embeddings V2
  • Terraform
Detail

Case study

The problem

Most tenant-aware retrieval demos handwave the boundary. The ones that don’t tend to bury the design in framework cruft. I wanted to build a multi-tenant RAG MVP that could demonstrate, on AWS, what tenant isolation actually costs you in architecture — and what it buys you in defensibility.

What I built

A multi-tenant support assistant for DevOps and SRE teams, deployed in a single AWS account as a working MVP.

Pipeline

S3 (drop) → SQS → starter Lambda → Step Functions → OpenSearch (per-tenant index)

                                       UI / Cognito-protected query

                            Hybrid retrieval (BM25 + vector) → Bedrock answer synthesis

Tenant boundary

  • One index per tenant in OpenSearch
  • JWT-bound query path — Cognito Hosted UI issues tokens, the query runtime extracts tenant claims and routes only to the matching index
  • Three fictional tenants in the corpus today: forgecraft-supply, roadkeep-fleet, northbeam-creative
  • Idempotent ingestion with replacement semantics for living documents (updates don’t fork history)

Models

  • Answer generation: Anthropic Claude Sonnet 4.6 via Bedrock inference profile global.anthropic.claude-sonnet-4-6
  • Embeddings: Amazon Titan Text Embeddings V2

Design decisions worth calling out

  • Tenant isolation as a first-class invariant, not a downstream filter. The cheapest way to prove tenant isolation is to make it impossible to query across tenants in the first place — index-per-tenant + JWT-bound routing does that.
  • Hybrid retrieval over pure vector, because operational support content is full of exact identifiers (incident IDs, error codes, ticket numbers) that vector search degrades on. BM25 carries the exact-match weight, vector carries the paraphrase weight.
  • Small provisioned OpenSearch domain for MVP cost discipline — not the target shape. Production direction is OpenSearch Serverless, or a larger deployment in a VPC on private subnets.
  • Markdown-only corpus, organized by document category — keeps ingestion simple while the retrieval and tenant-boundary work is the focus.

What I’d change for a production deployment

  • Move OpenSearch into a VPC on private subnets, fronted by a service-linked role
  • Move from inference profile to a tenant-scoped model gating layer (per-tenant model routing for cost/quality tiers)
  • Add per-tenant retrieval evaluation harness with ground-truth Q/A sets
  • Wire query traces into a tenant-aware observability layer so debugging stays tenant-respectful