Documentation · v0.1.4

Everything you need to run ragcompliance in production.

ragcompliance is drop-in middleware for LangChain and LlamaIndex that logs, signs, and stores every retrieval-augmented generation chain. This documentation covers installation, framework integration, the audit record schema, the dashboard, SSO, billing, SOC 2 evidence export, operational tuning, and a full API reference. Every code snippet on this page is copy-pasteable against ragcompliance >= 0.1.4.

Overview

ragcompliance sits between your chain and your observability stack. You keep your retriever, your vector store, your prompt, and your LLM exactly as they are. You pass one callback through the standard callback channel and, per invocation, one signed row lands in your Supabase.

Every row contains: the query verbatim, every retrieved chunk with source URL, chunk ID and similarity score, the LLM answer verbatim, the model name, end-to-end latency, the workspace ID, a session ID you control, a timestamp, and a SHA-256 chain signature computed over query + chunks + answer. If any of those three fields is mutated after-the-fact, the signature no longer validates at verification time.

That single row is what compliance teams need to sign off on a RAG system: "which document was cited, what did the model say, and prove it hasn't been modified since." Row-level security in Supabase means the same physical table can hold audit logs for many tenants without cross-contamination.

noteThis is middleware, not a vector database. Bring your own retriever (FAISS, Chroma, Pinecone, pgvector, Weaviate, any BaseRetriever). ragcompliance only cares about what your retriever returned and what the LLM answered.

Installation

The base install logs to stdout, which is good for local dev but not good for production. For anything real, install with the Supabase extra so the handler writes to a real audit table.

$ pip install "ragcompliance[supabase]"

Additional extras:

# + FastAPI dashboard
$ pip install "ragcompliance[supabase,dashboard]"

# + LlamaIndex handler
$ pip install "ragcompliance[supabase,llamaindex]"

# + OIDC single sign-on
$ pip install "ragcompliance[dashboard,sso]"

Requirements. Python 3.11 or newer. langchain-core >= 0.2 (covers LangChain 0.2+ and all LCEL chains). llama-index-core >= 0.10.

Supabase setup

Create a free Supabase project at supabase.com. You need two SQL scripts applied once in the SQL editor. One creates the audit log table with row-level security, the other adds the billing + usage tables used by the dashboard.

  1. Open the SQL editor in your Supabase project.
  2. Paste and run supabase_schema.sql from the repo (audit log table, indexes, RLS policies).
  3. Paste and run supabase_migration_billing.sql (subscriptions, query counters, period-end RPC).
  4. Copy the service_role key from your project's API settings. Do not use the anon key; RLS will block the handler from writing.
securityThe service role key bypasses RLS on your Supabase project. Store it in a secret manager, never commit it, never expose it to browser code. The handler only needs it server-side.

Environment variables

Copy .env.example from the repo and fill in your values. The handler reads these via RAGComplianceConfig.from_env().

RAGCOMPLIANCE_SUPABASE_URL=https://your-project.supabase.co
RAGCOMPLIANCE_SUPABASE_KEY=your-service-role-key
RAGCOMPLIANCE_WORKSPACE_ID=your-workspace-id   # one per tenant
RAGCOMPLIANCE_DEV_MODE=false                    # true = stdout, false = Supabase
RAGCOMPLIANCE_ENFORCE_QUOTA=false               # true = RuntimeError on overage
RAGCOMPLIANCE_ASYNC_WRITES=true                 # fire-and-forget (default)
RAGCOMPLIANCE_ASYNC_MAX_QUEUE=1000              # bounded buffer

workspace_id is how ragcompliance isolates audit logs across tenants. One workspace per customer in a multi-tenant SaaS. One per app for internal use. Row-level security keeps rows from leaking across workspaces even if an application bug asks for the wrong one.

Your first audited chain

Once Supabase is set up and environment variables are loaded, any chain you already have will audit itself the moment you attach the handler. Here's a minimal standalone example:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from ragcompliance import RAGComplianceHandler, RAGComplianceConfig

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
prompt    = ChatPromptTemplate.from_template("Context:\n{context}\n\nQ: {query}")
llm       = ChatOpenAI(model="gpt-4o-mini")

chain = (
    {"context": retriever | RunnableLambda(lambda d: "\n\n".join(c.page_content for c in d)),
     "query": RunnablePassthrough()}
    | prompt | llm
)

handler = RAGComplianceHandler(config=RAGComplianceConfig.from_env(), session_id="user-abc")
answer  = chain.invoke("Does section 4.2 cover indemnification?", config={"callbacks": [handler]})

After one invocation you'll see one row in rag_audit_logs. Query it from the SQL editor, the dashboard UI, or the /api/logs endpoint.

LangChain handler

The LangChain handler is LCEL-safe and latches onto the outermost chain by default, which is what you want for 95% of setups. It captures:

  • the first user-facing query that entered the chain
  • every document the retriever yielded (source_url from metadata["source"], chunk_id from metadata["chunk_id"], and similarity_score if the retriever exposes it)
  • the LLM's final string output
  • the model name reported by the LLM integration
  • end-to-end latency, rounded to the millisecond

Session IDs

The session_id argument is free-form. Use it to correlate a single user's conversation across many chain invocations. A common pattern is one session per chat thread, logged in by your application.

handler = RAGComplianceHandler(
    config=RAGComplianceConfig.from_env(),
    session_id=request.session["chat_id"],
)

Passing source metadata

For source URLs to land in the audit record, your Document objects need metadata["source"] set (and ideally metadata["chunk_id"]). Most loaders do this already. A manual retriever looks like:

from langchain_core.documents import Document

docs = [
    Document(
        page_content="Section 4.2 obligates Party A to indemnify…",
        metadata={"source": "s3://contracts/acme-msa-v3.pdf", "chunk_id": "chunk-042"},
    ),
    …
]

LlamaIndex handler

The LlamaIndex handler latches onto the SYNTHESIZE event so it captures the final synthesized answer alongside the retrieved nodes. It's set globally via CallbackManager on Settings, so every query engine you subsequently construct inherits it.

from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from ragcompliance import RAGComplianceConfig
from ragcompliance.llamaindex_handler import LlamaIndexRAGComplianceHandler

handler = LlamaIndexRAGComplianceHandler(
    config=RAGComplianceConfig.from_env(),
    session_id="user-abc",
)
Settings.callback_manager = CallbackManager([handler])

response = query_engine.query("Does section 4.2 cover indemnification?")

Audit record schema

Every invocation writes one row into rag_audit_logs:

{
  "id": "c4e91…",                          // uuid
  "session_id": "user-abc",
  "workspace_id": "acme-prod",
  "query": "Does section 4.2 cover indemnification?",
  "retrieved_chunks": [
    {
      "content": "Section 4.2 obligates Party A…",
      "source_url": "s3://contracts/acme-msa-v3.pdf",
      "chunk_id": "chunk-042",
      "similarity_score": 0.94
    }
  ],
  "llm_answer": "Section 4.2 covers indemnification obligations…",
  "model_name": "gpt-4o-mini",
  "chain_signature": "a3f8c2d1…",
  "timestamp": "2026-04-10T06:00:00Z",
  "latency_ms": 1240
}

Every field is nullable except workspace_id, query, timestamp, and chain_signature. A chain that never returned a result (retriever failed, LLM errored) still writes a row with the error captured downstream by Slack alerts.

Signature verification

The signature is computed deterministically over the normalized JSON of the three fields that matter for accountability:

import hashlib, json

payload = {
    "query": record["query"],
    "chunks": [
        {"content": c["content"], "source_url": c["source_url"], "chunk_id": c["chunk_id"]}
        for c in record["retrieved_chunks"] or []
    ],
    "answer": record["llm_answer"],
}
expected = hashlib.sha256(
    json.dumps(payload, sort_keys=True, default=str).encode()
).hexdigest()

assert expected == record["chain_signature"], "record was tampered with after write"

This algorithm is stable across language runtimes because sort_keys=True canonicalizes the JSON. An auditor with SQL access can re-run the check without a Python dependency using their own SHA-256 implementation over the same canonical payload.

Running the dashboard

The dashboard is a single-file FastAPI app. Run it locally with:

$ pip install "ragcompliance[dashboard]"
$ uvicorn ragcompliance.app:app --reload

It ships with an HTML dashboard at / showing stats cards, recent logs, and export buttons, plus a JSON + CSV API under /api/….

HTTP endpoints

MethodPathPurpose
GET/HTML dashboard: stat cards, recent logs, export buttons
GET/healthLiveness probe. Always 200 OK.
GET/health/billingStripe live-mode readiness probe. 200 when configured, 503 + issues list otherwise.
GET/api/logsPaginated audit records. Supports workspace_id, session_id, start, end, limit, offset.
GET/api/logs/detail/{id}Single audit record.
GET/api/logs/export.csvCSV export with filter query params.
GET/api/logs/export.jsonJSON file export with filter query params.
GET/api/summaryAggregate stats: total queries, unique sessions, avg latency, models.
GET/api/plansConfigured billing plans + their Stripe price IDs.
POST/billing/checkoutStart a Stripe Checkout session. Body: {workspace_id, tier}.
POST/stripe/webhookStripe event receiver. Verifies signature.
GET/billing/subscription/{workspace_id}Current subscription + usage for one workspace.
GET/loginRedirects to the configured OIDC provider (when SSO is enabled).
GET/auth/callbackOIDC callback. Validates email domain, seeds session.
GET/logoutClears the session cookie.

SSO (OIDC)

The dashboard ships open by default so local dev stays frictionless. Set four environment variables and SSO turns on via standards OIDC discovery: Google Workspace, Okta, Auth0, Microsoft Entra, Authentik, Keycloak, and any IdP that exposes a .well-known/openid-configuration document.

$ pip install "ragcompliance[dashboard,sso]"
RAGCOMPLIANCE_OIDC_ISSUER=https://accounts.google.com
RAGCOMPLIANCE_OIDC_CLIENT_ID=your-client-id
RAGCOMPLIANCE_OIDC_CLIENT_SECRET=your-client-secret
RAGCOMPLIANCE_OIDC_REDIRECT_URI=https://dash.example.com/auth/callback
RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINS=acme.com,acme.co.uk  # optional allowlist
RAGCOMPLIANCE_SESSION_SECRET=$(python -c "import secrets; print(secrets.token_urlsafe(48))")

With SSO enabled, every route except /health, /login, /auth/callback, /logout, and /stripe/webhook requires a signed-in session. Browser requests get a 302 redirect to /login. API clients (anything sending Accept: application/json) get a 401 so scripted access surfaces cleanly instead of following redirects into HTML.

domain allowlistThe RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINS allowlist is optional. Leave it unset to permit any email the IdP authenticates. Set it to one or more comma-separated domains to lock down to corporate accounts only.

Plans & checkout

reference implementationThese plans and prices are a ready-to-run reference implementation for operators who want to run ragcompliance as an internal product and meter downstream users. They are not a paid tier of this project. ragcompliance itself is MIT licensed and free to self-host forever: rewrite PLANS in ragcompliance.billing to anything you want, or remove the billing router entirely if you don't need it.

Two plans ship out of the box as a reference. Both are configured via Stripe products + recurring prices and wired into the dashboard at boot.

TierExample priceQueries / monthExtras
Team (reference)$49 / mo10,000CSV/JSON export, email support
Enterprise (reference)$199 / moUnlimitedSSO, custom retention, priority review

Start a checkout from your app:

import requests

r = requests.post(
    "https://your-dashboard.example.com/billing/checkout",
    json={"workspace_id": "my-workspace", "tier": "team"},
)
checkout_url = r.json()["checkout_url"]
# redirect user to checkout_url

Quota enforcement

Quota enforcement is soft by default: the chain logs a warning if the workspace is over its limit but the invocation still runs. Set RAGCOMPLIANCE_ENFORCE_QUOTA=true to hard-block. In fail-closed mode the handler raises RuntimeError before the LLM runs, and no audit row is written for the blocked invocation.

why soft by defaultThe first time you wire billing in, you don't want an expired invoice to suddenly break your product. Keep it soft while you verify that Stripe webhooks are delivering, then flip it to hard.

Period rollover

Query counters reset automatically at each billing period rollover. The reset is driven by Stripe's customer.subscription.updated webhook. If the webhook is ever missed (network blip, endpoint downtime, misconfigured secret), check_query_quota has a self-healing fallback: it compares the stored period_end to now() and forces a reset if the period has lapsed, so a dropped webhook can never permanently lock a workspace out.

Going live (Stripe)

Flipping the dashboard from test mode to live mode is a four-step runbook. The readiness probe catches the paste errors before a customer does.

  1. In the Stripe dashboard, switch to Live mode. Create the Team and Enterprise products + recurring prices. Live mode is a separate universe from test mode, the price IDs do not carry over. Copy the two price_live_… IDs.
  2. Update your deployment environment:
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_...                   # from the live webhook endpoint
STRIPE_PRICE_ID_TEAM=price_live_...
STRIPE_PRICE_ID_ENTERPRISE=price_live_...
APP_BASE_URL=https://dash.example.com             # must NOT be localhost in live mode
  1. In Stripe → Developers → Webhooks, create a new live-mode endpoint at https://<your-dash>/stripe/webhook. Subscribe to checkout.session.completed, customer.subscription.updated, customer.subscription.deleted, and invoice.paid. Paste the signing secret into STRIPE_WEBHOOK_SECRET.
  2. Hit the readiness probe:
$ curl https://<your-dash>/health/billing

A fully-configured live deployment returns {"ok": true, "mode": "live", …} with a 200. Any misconfiguration comes back as 503 with an issues list:

{
  "ok": false,
  "mode": "live",
  "issues": [
    "STRIPE_WEBHOOK_SECRET is not set",
    "STRIPE_PRICE_ID_TEAM looks wrong, expected 'price_' prefix",
    "APP_BASE_URL must not be localhost in live mode"
  ],
  "summary": {
    "secret_key_prefix": "sk_live…",
    "webhook_secret_set": false,
    "supabase_configured": true
  }
}

Response sanitises every secret (only prefixes like sk_live… ever leak), so it's safe for an uptime monitor or status page to poll.

Programmatic callers get the same structure via BillingManager.readiness() returning a BillingReadiness dataclass.

from ragcompliance import BillingManager, BillingReadiness

rd: BillingReadiness = BillingManager().readiness()
if not rd.ok:
    raise SystemExit("refusing to start: " + ", ".join(rd.issues))

SOC 2 evidence

Most compliance teams can't sign off on a RAG pipeline without a written trail of what was retrieved, what was answered, and proof that the trail hasn't been tampered with. The built-in evidence generator produces a Markdown report mapped to the Trust Services Criteria controls ragcompliance actually has data for: CC6.1 (logical access), CC7.2 (system operation monitoring), CC8.1 (change management), A1.1 (availability), C1.1 (confidentiality).

$ python -m ragcompliance.soc2 \
    --workspace acme-prod \
    --start 2026-01-01 \
    --end 2026-03-31 \
    --sample 25 --seed 42 \
    --out acme-q1-2026-evidence.md

The report pulls records straight from rag_audit_logs, computes integrity stats (signed vs unsigned, unique sessions, avg latency, models observed), recomputes the SHA-256 signature on a random sample so an auditor can spot-check independently, and renders the control matrix and methodology section. It is not itself a SOC 2 attestation (only a licensed auditor can issue one), but it cuts audit-prep back-and-forth from weeks to minutes.

Sample size and confidence

The evidence report recomputes SHA-256 signatures on a random sample of records from the period. The default is 25 records, suitable for a quarterly compliance spot-check. For deeper due-diligence runs, raise it by passing --sample 100 or higher. Sampling is random but seeded via --seed for reproducibility, so an auditor re-running with the same inputs gets the same sample. A given run may not surface a specific tampered record if the tamper rate is low and the sample size is small; the relationship is the standard hypergeometric one. For exhaustive verification across the full period, pass a sample_size equal to the total record count or loop _verify_signature over every record programmatically.

Programmatic access is the same pipeline without argparse:

from ragcompliance.soc2 import generate_report

md = generate_report(
    workspace_id="acme-prod",
    start="2026-01-01",
    end="2026-03-31",
    sample_size=25,
    seed=42,
)

Async audit writes

Handler overhead is under 1ms at p50 (~38µs measured in isolation on a clean hot path). End-to-end chain latency depends on your retriever, LLM, and prompt; the handler's contribution is a small constant added on top of whatever your chain does.

Audit writes are fire-and-forget by default. save() enqueues the record onto a bounded in-memory queue and a single daemon worker drains it into Supabase, so the chain's hot path never blocks on audit I/O.

In benchmarks, per-chain audit-write overhead drops from roughly 1.2s (sync Supabase RTT) to well under 1ms (enqueue only), a three to four order of magnitude improvement.

If Supabase is unreachable, records buffer in memory up to RAGCOMPLIANCE_ASYNC_MAX_QUEUE (default 1000) and then drop with a log warning rather than leak memory. On normal process exit an atexit hook drains pending records within RAGCOMPLIANCE_ASYNC_SHUTDOWN_TIMEOUT seconds (default 5). You can also call handler.storage.flush() explicitly in tests or your own shutdown path.

when to disable asyncSet RAGCOMPLIANCE_ASYNC_WRITES=false for tests that inspect storage mid-chain, or for workloads where you'd rather pay the latency than risk any in-flight data loss on a crash.

Slack anomaly alerts

Set RAGCOMPLIANCE_SLACK_WEBHOOK_URL to a Slack incoming-webhook URL (or any compatible receiver: Discord, Teams via shim, your own HTTP endpoint) and the handler fires async alerts when a chain looks unhealthy. Four rules, all env-tunable:

RuleFires whenTuned by
retrieval_returned_zero_chunksRetriever returned no documents.n/a
low_similarityBest matching chunk scored below threshold.RAGCOMPLIANCE_SLACK_MIN_SIMILARITY (default 0.3)
chain_slowEnd-to-end latency exceeded threshold.RAGCOMPLIANCE_SLACK_SLOW_CHAIN_MS (default 10000)
chain_erroredLangChain or LlamaIndex raised before the chain completed.n/a

Alerts post on a separate daemon worker with a bounded queue, so Slack outages can't back-pressure your chain. When the queue fills, alerts drop with a log warning. Set RAGCOMPLIANCE_SLACK_DASHBOARD_URL to include a View in dashboard link in each payload.

Deployment

The dashboard is a single FastAPI app. It's stateless (state lives in Supabase), so any container platform works identically.

Render (fastest)

  1. Create a new Web Service on render.com, pointing at your repo.
  2. Build command: pip install -e ".[supabase,dashboard,llamaindex,sso]"
  3. Start command: uvicorn ragcompliance.app:app --host 0.0.0.0 --port $PORT
  4. Copy every variable from .env.example into Render's environment settings.
  5. After the service is live, update the Stripe webhook endpoint to https://<your-render-url>/stripe/webhook.

Fly.io, Railway, Cloud Run

All three work identically. The app is a stateless container, no volumes, no sidecars. A minimal Dockerfile:

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -e ".[supabase,dashboard,llamaindex,sso]"
CMD ["uvicorn", "ragcompliance.app:app", "--host", "0.0.0.0", "--port", "8000"]

API reference

RAGComplianceHandler

LangChain callback handler. Instantiate once per chat session or once per request, pass via config={"callbacks": [handler]}.

RAGComplianceHandler(
    config: RAGComplianceConfig,
    session_id: str | None = None,
)

LlamaIndexRAGComplianceHandler

LlamaIndex callback handler. Import from ragcompliance.llamaindex_handler. Attach via Settings.callback_manager = CallbackManager([handler]).

RAGComplianceConfig

Configuration dataclass. The common constructor is RAGComplianceConfig.from_env(), which reads every RAGCOMPLIANCE_* variable.

RAGComplianceConfig(
    supabase_url: str | None,
    supabase_key: str | None,
    workspace_id: str,
    dev_mode: bool = False,
    enforce_quota: bool = False,
    async_writes: bool = True,
    async_max_queue: int = 1000,
    async_shutdown_timeout: float = 5.0,
)

BillingManager / BillingReadiness

BillingManager(
    stripe_secret_key: str | None = None,
    stripe_webhook_secret: str | None = None,
    supabase_url: str | None = None,
    supabase_key: str | None = None,
    app_base_url: str | None = None,
)

BillingManager.readiness() -> BillingReadiness

@dataclass
class BillingReadiness:
    mode: str            # "live" | "test" | "unconfigured"
    ok: bool
    issues: list[str]
    summary: dict[str, Any]
    def to_dict(self) -> dict: ...

ragcompliance.soc2.generate_report

generate_report(
    workspace_id: str,
    start: str,                     # ISO date
    end: str,                       # ISO date
    sample_size: int = 25,
    seed: int | None = None,
    storage: AuditStorage | None = None,
) -> str

Environment variable reference

NameDefaultPurpose
RAGCOMPLIANCE_SUPABASE_URLSupabase project URL.
RAGCOMPLIANCE_SUPABASE_KEYSupabase service role key.
RAGCOMPLIANCE_WORKSPACE_IDTenant isolation key.
RAGCOMPLIANCE_DEV_MODEfalsetrue = log to stdout; false = persist to Supabase.
RAGCOMPLIANCE_ENFORCE_QUOTAfalsetrue raises RuntimeError when over limit.
RAGCOMPLIANCE_ASYNC_WRITEStrueFire-and-forget audit writes.
RAGCOMPLIANCE_ASYNC_MAX_QUEUE1000Bounded in-memory buffer size.
RAGCOMPLIANCE_ASYNC_SHUTDOWN_TIMEOUT5.0Seconds to wait on atexit drain.
RAGCOMPLIANCE_SLACK_WEBHOOK_URLEnables Slack alerts when set.
RAGCOMPLIANCE_SLACK_MIN_SIMILARITY0.3Threshold for low_similarity.
RAGCOMPLIANCE_SLACK_SLOW_CHAIN_MS10000Threshold for chain_slow.
RAGCOMPLIANCE_SLACK_DASHBOARD_URLAppends a "View in dashboard" link to alert payloads.
RAGCOMPLIANCE_OIDC_ISSUEROIDC provider URL (discovery endpoint).
RAGCOMPLIANCE_OIDC_CLIENT_IDOIDC client ID.
RAGCOMPLIANCE_OIDC_CLIENT_SECRETOIDC client secret.
RAGCOMPLIANCE_OIDC_REDIRECT_URIFull callback URL (.../auth/callback).
RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINSOptional comma-separated email domain allowlist.
RAGCOMPLIANCE_SESSION_SECRETSession cookie signing key.
STRIPE_SECRET_KEYsk_live_… or sk_test_….
STRIPE_WEBHOOK_SECRETEndpoint signing secret from the Stripe webhook config.
STRIPE_PRICE_ID_TEAMStripe price ID (must start with price_).
STRIPE_PRICE_ID_ENTERPRISEStripe price ID (must start with price_).
APP_BASE_URLDashboard base URL. Must not be localhost in live mode.

License & contact

MIT licensed. Source at github.com/dakshtrehan/ragcompliance. Issues, PRs, and questions welcome.

For enterprise support, private-deployment guidance, or SOC 2 prep help, email daksh.trehan@hotmail.com.

ragcompliance· v0.1.4· 145 tests green· MIT