Post

May 21, 2026 AI Security

The Security Boundary Is Not The Prompt

Multi-tenant systems usually fail in subtle ways before they fail in dramatic ones.

The application does not need to dump a database or expose an admin console to create a serious problem. One request touches data from the wrong tenant. One report includes the wrong customer. One search query crosses a boundary it should not cross. The damage is real even when the headline never gets written.

That is why the most important multi-tenant security question is not:

Did the response expose the wrong data?

It is:

Could the system access the wrong tenant’s data in the first place?

That distinction shaped the security model for Tenant Lens, a multi-tenant AWS platform that uses retrieval-augmented generation to answer operational questions from tenant-specific runbooks, postmortems, design documents, and incident records.

RAG is the working example here, but the pattern is broader. Any multi-tenant platform — search, reporting, analytics, workflow automation, support tooling, customer portals, admin systems — has a point where authenticated identity must become authorized data scope.

That conversion is a security boundary. It should be treated like one.

The Risk Is The Data Plane, Not The Model

In AI systems, it is tempting to make the security conversation about the model: hallucinations, prompt leaks, jailbreaks, sensitive output. Those are real concerns, but they are not the whole problem.

For multi-tenant systems, the more basic question is whether the data plane can ever provide the wrong evidence to the model. Once unauthorized evidence reaches the prompt, the system is already relying on downstream behavior to avoid disclosure.

That is not where the strongest control should live.

The model should not be asked to protect data it should never have received. Citations should not be cleaned up after retrieval. Post-processing should not be the primary isolation mechanism.

The stronger pattern is to enforce tenant scope before the query touches the retrieval layer.

Segregation Starts At Ingestion

Query-time authorization matters, but it is only half the story. The other half is what happens when data enters the platform.

If ingestion dumps every tenant’s documents into one undifferentiated pool, every downstream control has to work harder. Retrieval has to filter correctly. Evaluation has to detect cross-tenant evidence. Audit trails have to reconstruct ownership after the fact. A single missing tenant condition can turn into a data-plane bug.

Tenant Lens avoids that by treating tenant identity as part of the ingestion path, not just as a query-time filter. Documents arrive under tenant-scoped S3 keys, ingestion records carry tenant identity, chunks preserve tenant provenance, and the indexing step writes into the target OpenSearch index for that tenant.

In other words, tenant segregation is established before retrieval exists.

Tenant Lens ingestion authorization flow: tenant-scoped documents are validated, processed, embedded, and written into per-tenant OpenSearch indexes

The query path then selects from tenant-specific lanes that were already separated during ingestion, rather than trying to separate a mixed pile of evidence at the last second. Isolation has to be present when data is written and when data is read — if either side is weak, the system can still drift into cross-tenant behavior.

Structural Isolation Over Conditional Isolation

A common early architecture for multi-tenant search or RAG looks like this:

Store all tenant data in one shared index.
Attach tenant_id as metadata.
Run retrieval across the shared index.
Filter results by tenant.
Send the remaining evidence to the model.

That can be made to work, especially for low-risk systems with mature query builders. But it creates a fragile dependency: every query path must apply the tenant filter correctly forever. Picture a sprint where someone ships a new /admin/search endpoint, copies a retrieval helper that takes an optional filter argument, and forgets to pass it. The endpoint works. Tests pass. The boundary is gone for a week before anyone notices.

That is the failure mode metadata filters invite. They are useful as defense in depth, but they should not be the only tenant boundary.

Conditional isolation says:

The data is together, but every query promises to filter correctly.

Structural isolation says:

The data path itself is scoped before the query runs.

For Tenant Lens, I chose structural isolation: one OpenSearch index per tenant, and an authorization step that decides which indexes the query is even allowed to touch. A shared index would have been simpler to operate and cheaper at scale. The tradeoff was deliberate — I wanted the security property visible in the architecture, not buried in a query builder.

Tenant Lens: Authorization Before Retrieval

Tenant Lens uses Cognito for authentication and tenant-to-group mapping. The query API is protected by JWT authorization, but the important part happens inside the query path.

Before retrieval runs, the query Lambda:

Extracts Cognito groups from the user’s token.
Maps those groups to allowed tenant IDs.
Validates that the requested tenant scope is allowed.
Builds the OpenSearch target index list from that validated scope.
Runs retrieval only against authorized tenant indexes.

Tenant Lens query authorization flow: JWT claims are resolved into tenant scope before retrieval runs against authorized indexes only

In code, the scope-resolution step looks roughly like this:

def resolve_authorized_indexes(jwt_claims, requested_tenants):
    groups = jwt_claims.get("cognito:groups", [])
    allowed_tenants = {
        tenant_id
        for group in groups
        for tenant_id in TENANT_GROUP_MAP.get(group, [])
    }

    if not requested_tenants:
        requested_tenants = allowed_tenants

    unauthorized = set(requested_tenants) - allowed_tenants
    if unauthorized:
        raise AuthorizationError(f"Not authorized for: {unauthorized}")

    return [f"tenant-{t}-docs" for t in requested_tenants]

The query does not retrieve broadly and filter later. It retrieves only from indexes the caller is allowed to query. If resolve_authorized_indexes raises, retrieval never runs.

The integration test for this is simple and worth writing: a token scoped to tenant-a issues a query referencing tenant-b, and the request must fail closed before any OpenSearch call is made. That test pins the boundary in CI, so a future refactor cannot quietly remove it.

The goal is one line:

You cannot retrieve what you are not authorized to see.

Cross-Tenant Access Still Needs A Model

Some users legitimately need cross-tenant access. Platform teams, support engineers, security teams, and administrators may need to inspect multiple tenants.

The answer is not to bypass the tenant model. The answer is to model elevated access explicitly.

In Tenant Lens, cross-tenant access is represented as authorization to multiple tenant scopes. A privileged user can fan out retrieval across multiple authorized indexes and merge the results, but that is still based on validated scope. It is not an escape hatch around tenant isolation.

Administrative access should be broader, not undefined. It should be auditable, explainable, and represented in the same authorization model as normal access.

RAG Makes The Boundary More Important

RAG does not create the multi-tenant security problem. It makes the consequences easier to trigger.

Traditional search returns documents or snippets. A RAG system retrieves evidence, places it into model context, and asks the model to synthesize an answer. That synthesis step can blur source boundaries if retrieval gives it mixed or unauthorized evidence.

That is why prompt instructions are not enough. You can tell the model to only answer from authorized data. You can tell it to cite sources carefully. Those are useful guardrails, but they are not substitutes for data-plane authorization.

The model should receive only evidence the caller is allowed to use. Everything else is defense in depth.

Citations And Provenance Are Part Of The Control Surface

Tenant Lens returns grounded answers with citations. Every retrieved chunk carries tenant ID, document ID, chunk ID, source key, filename, category, and section context.

That provenance is useful for trust, but it is also part of the audit story. If a customer asks why an answer was generated, the system can show which tenant-scoped sources were retrieved. If an engineer is debugging a suspicious response, they can trace retrieval back to source documents and authorization scope. And during evaluation, the platform can test whether expected sources appeared and unauthorized sources did not.

For multi-tenant systems, provenance is not just a user experience feature. It is part of the control surface.

The General Pattern

The Tenant Lens implementation is specific: Cognito, API Gateway, Lambda, OpenSearch, Bedrock, per-tenant indexes. The security pattern is more general:

Authenticate the caller.
Resolve tenant membership and role claims.
Convert identity into allowed data scope.
Validate requested scope against allowed scope.
Build the data-plane query from validated scope only.
Retrieve or operate only within that scope.
Preserve provenance and audit context.
Treat elevated access as explicit scoped access, not a bypass.

The downstream system can be a vector index, SQL database, object store, analytics engine, or workflow tool. The implementation details change. The principle does not.

Tenant isolation should happen before data access, not after.

For Tenant Lens, the most important security decision was not which language model generated the answer. It was where tenant scope was enforced.

The security boundary is not the prompt.

The security boundary is the path that decides what data the system is allowed to touch before the prompt ever exists.