ragcompliance is drop-in middleware for LangChain and LlamaIndex that logs, signs, and stores every retrieval-augmented generation chain. This documentation covers installation, framework integration, the audit record schema, the dashboard, SSO, billing, SOC 2 evidence export, operational tuning, and a full API reference. Every code snippet on this page is copy-pasteable against ragcompliance >= 0.1.4.
Overview
ragcompliance sits between your chain and your observability stack. You keep your retriever, your vector store, your prompt, and your LLM exactly as they are. You pass one callback through the standard callback channel and, per invocation, one signed row lands in your Supabase.
Every row contains: the query verbatim, every retrieved chunk with source URL, chunk ID and similarity score, the LLM answer verbatim, the model name, end-to-end latency, the workspace ID, a session ID you control, a timestamp, and a SHA-256 chain signature computed over query + chunks + answer. If any of those three fields is mutated after-the-fact, the signature no longer validates at verification time.
That single row is what compliance teams need to sign off on a RAG system: "which document was cited, what did the model say, and prove it hasn't been modified since." Row-level security in Supabase means the same physical table can hold audit logs for many tenants without cross-contamination.
BaseRetriever). ragcompliance only cares about what your retriever returned and what the LLM answered.Installation
The base install logs to stdout, which is good for local dev but not good for production. For anything real, install with the Supabase extra so the handler writes to a real audit table.
$ pip install "ragcompliance[supabase]"
Additional extras:
# + FastAPI dashboard
$ pip install "ragcompliance[supabase,dashboard]"
# + LlamaIndex handler
$ pip install "ragcompliance[supabase,llamaindex]"
# + OIDC single sign-on
$ pip install "ragcompliance[dashboard,sso]"
Requirements. Python 3.11 or newer. langchain-core >= 0.2 (covers LangChain 0.2+ and all LCEL chains). llama-index-core >= 0.10.
Supabase setup
Create a free Supabase project at supabase.com. You need two SQL scripts applied once in the SQL editor. One creates the audit log table with row-level security, the other adds the billing + usage tables used by the dashboard.
- Open the SQL editor in your Supabase project.
- Paste and run
supabase_schema.sqlfrom the repo (audit log table, indexes, RLS policies). - Paste and run
supabase_migration_billing.sql(subscriptions, query counters, period-end RPC). - Copy the
service_rolekey from your project's API settings. Do not use theanonkey; RLS will block the handler from writing.
Environment variables
Copy .env.example from the repo and fill in your values. The handler reads these via RAGComplianceConfig.from_env().
RAGCOMPLIANCE_SUPABASE_URL=https://your-project.supabase.co
RAGCOMPLIANCE_SUPABASE_KEY=your-service-role-key
RAGCOMPLIANCE_WORKSPACE_ID=your-workspace-id # one per tenant
RAGCOMPLIANCE_DEV_MODE=false # true = stdout, false = Supabase
RAGCOMPLIANCE_ENFORCE_QUOTA=false # true = RuntimeError on overage
RAGCOMPLIANCE_ASYNC_WRITES=true # fire-and-forget (default)
RAGCOMPLIANCE_ASYNC_MAX_QUEUE=1000 # bounded buffer
workspace_id is how ragcompliance isolates audit logs across tenants. One workspace per customer in a multi-tenant SaaS. One per app for internal use. Row-level security keeps rows from leaking across workspaces even if an application bug asks for the wrong one.
Your first audited chain
Once Supabase is set up and environment variables are loaded, any chain you already have will audit itself the moment you attach the handler. Here's a minimal standalone example:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from ragcompliance import RAGComplianceHandler, RAGComplianceConfig
retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
prompt = ChatPromptTemplate.from_template("Context:\n{context}\n\nQ: {query}")
llm = ChatOpenAI(model="gpt-4o-mini")
chain = (
{"context": retriever | RunnableLambda(lambda d: "\n\n".join(c.page_content for c in d)),
"query": RunnablePassthrough()}
| prompt | llm
)
handler = RAGComplianceHandler(config=RAGComplianceConfig.from_env(), session_id="user-abc")
answer = chain.invoke("Does section 4.2 cover indemnification?", config={"callbacks": [handler]})
After one invocation you'll see one row in rag_audit_logs. Query it from the SQL editor, the dashboard UI, or the /api/logs endpoint.
LangChain handler
The LangChain handler is LCEL-safe and latches onto the outermost chain by default, which is what you want for 95% of setups. It captures:
- the first user-facing query that entered the chain
- every document the retriever yielded (
source_urlfrommetadata["source"],chunk_idfrommetadata["chunk_id"], andsimilarity_scoreif the retriever exposes it) - the LLM's final string output
- the model name reported by the LLM integration
- end-to-end latency, rounded to the millisecond
Session IDs
The session_id argument is free-form. Use it to correlate a single user's conversation across many chain invocations. A common pattern is one session per chat thread, logged in by your application.
handler = RAGComplianceHandler(
config=RAGComplianceConfig.from_env(),
session_id=request.session["chat_id"],
)
Passing source metadata
For source URLs to land in the audit record, your Document objects need metadata["source"] set (and ideally metadata["chunk_id"]). Most loaders do this already. A manual retriever looks like:
from langchain_core.documents import Document
docs = [
Document(
page_content="Section 4.2 obligates Party A to indemnify…",
metadata={"source": "s3://contracts/acme-msa-v3.pdf", "chunk_id": "chunk-042"},
),
…
]
LlamaIndex handler
The LlamaIndex handler latches onto the SYNTHESIZE event so it captures the final synthesized answer alongside the retrieved nodes. It's set globally via CallbackManager on Settings, so every query engine you subsequently construct inherits it.
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from ragcompliance import RAGComplianceConfig
from ragcompliance.llamaindex_handler import LlamaIndexRAGComplianceHandler
handler = LlamaIndexRAGComplianceHandler(
config=RAGComplianceConfig.from_env(),
session_id="user-abc",
)
Settings.callback_manager = CallbackManager([handler])
response = query_engine.query("Does section 4.2 cover indemnification?")
Audit record schema
Every invocation writes one row into rag_audit_logs:
{
"id": "c4e91…", // uuid
"session_id": "user-abc",
"workspace_id": "acme-prod",
"query": "Does section 4.2 cover indemnification?",
"retrieved_chunks": [
{
"content": "Section 4.2 obligates Party A…",
"source_url": "s3://contracts/acme-msa-v3.pdf",
"chunk_id": "chunk-042",
"similarity_score": 0.94
}
],
"llm_answer": "Section 4.2 covers indemnification obligations…",
"model_name": "gpt-4o-mini",
"chain_signature": "a3f8c2d1…",
"timestamp": "2026-04-10T06:00:00Z",
"latency_ms": 1240
}
Every field is nullable except workspace_id, query, timestamp, and chain_signature. A chain that never returned a result (retriever failed, LLM errored) still writes a row with the error captured downstream by Slack alerts.
Signature verification
The signature is computed deterministically over the normalized JSON of the three fields that matter for accountability:
import hashlib, json
payload = {
"query": record["query"],
"chunks": [
{"content": c["content"], "source_url": c["source_url"], "chunk_id": c["chunk_id"]}
for c in record["retrieved_chunks"] or []
],
"answer": record["llm_answer"],
}
expected = hashlib.sha256(
json.dumps(payload, sort_keys=True, default=str).encode()
).hexdigest()
assert expected == record["chain_signature"], "record was tampered with after write"
This algorithm is stable across language runtimes because sort_keys=True canonicalizes the JSON. An auditor with SQL access can re-run the check without a Python dependency using their own SHA-256 implementation over the same canonical payload.
Running the dashboard
The dashboard is a single-file FastAPI app. Run it locally with:
$ pip install "ragcompliance[dashboard]"
$ uvicorn ragcompliance.app:app --reload
It ships with an HTML dashboard at / showing stats cards, recent logs, and export buttons, plus a JSON + CSV API under /api/….
HTTP endpoints
| Method | Path | Purpose |
|---|---|---|
| GET | / | HTML dashboard: stat cards, recent logs, export buttons |
| GET | /health | Liveness probe. Always 200 OK. |
| GET | /health/billing | Stripe live-mode readiness probe. 200 when configured, 503 + issues list otherwise. |
| GET | /api/logs | Paginated audit records. Supports workspace_id, session_id, start, end, limit, offset. |
| GET | /api/logs/detail/{id} | Single audit record. |
| GET | /api/logs/export.csv | CSV export with filter query params. |
| GET | /api/logs/export.json | JSON file export with filter query params. |
| GET | /api/summary | Aggregate stats: total queries, unique sessions, avg latency, models. |
| GET | /api/plans | Configured billing plans + their Stripe price IDs. |
| POST | /billing/checkout | Start a Stripe Checkout session. Body: {workspace_id, tier}. |
| POST | /stripe/webhook | Stripe event receiver. Verifies signature. |
| GET | /billing/subscription/{workspace_id} | Current subscription + usage for one workspace. |
| GET | /login | Redirects to the configured OIDC provider (when SSO is enabled). |
| GET | /auth/callback | OIDC callback. Validates email domain, seeds session. |
| GET | /logout | Clears the session cookie. |
SSO (OIDC)
The dashboard ships open by default so local dev stays frictionless. Set four environment variables and SSO turns on via standards OIDC discovery: Google Workspace, Okta, Auth0, Microsoft Entra, Authentik, Keycloak, and any IdP that exposes a .well-known/openid-configuration document.
$ pip install "ragcompliance[dashboard,sso]"
RAGCOMPLIANCE_OIDC_ISSUER=https://accounts.google.com
RAGCOMPLIANCE_OIDC_CLIENT_ID=your-client-id
RAGCOMPLIANCE_OIDC_CLIENT_SECRET=your-client-secret
RAGCOMPLIANCE_OIDC_REDIRECT_URI=https://dash.example.com/auth/callback
RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINS=acme.com,acme.co.uk # optional allowlist
RAGCOMPLIANCE_SESSION_SECRET=$(python -c "import secrets; print(secrets.token_urlsafe(48))")
With SSO enabled, every route except /health, /login, /auth/callback, /logout, and /stripe/webhook requires a signed-in session. Browser requests get a 302 redirect to /login. API clients (anything sending Accept: application/json) get a 401 so scripted access surfaces cleanly instead of following redirects into HTML.
RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINS allowlist is optional. Leave it unset to permit any email the IdP authenticates. Set it to one or more comma-separated domains to lock down to corporate accounts only.Plans & checkout
PLANS in ragcompliance.billing to anything you want, or remove the billing router entirely if you don't need it.Two plans ship out of the box as a reference. Both are configured via Stripe products + recurring prices and wired into the dashboard at boot.
| Tier | Example price | Queries / month | Extras |
|---|---|---|---|
| Team (reference) | $49 / mo | 10,000 | CSV/JSON export, email support |
| Enterprise (reference) | $199 / mo | Unlimited | SSO, custom retention, priority review |
Start a checkout from your app:
import requests
r = requests.post(
"https://your-dashboard.example.com/billing/checkout",
json={"workspace_id": "my-workspace", "tier": "team"},
)
checkout_url = r.json()["checkout_url"]
# redirect user to checkout_url
Quota enforcement
Quota enforcement is soft by default: the chain logs a warning if the workspace is over its limit but the invocation still runs. Set RAGCOMPLIANCE_ENFORCE_QUOTA=true to hard-block. In fail-closed mode the handler raises RuntimeError before the LLM runs, and no audit row is written for the blocked invocation.
Period rollover
Query counters reset automatically at each billing period rollover. The reset is driven by Stripe's customer.subscription.updated webhook. If the webhook is ever missed (network blip, endpoint downtime, misconfigured secret), check_query_quota has a self-healing fallback: it compares the stored period_end to now() and forces a reset if the period has lapsed, so a dropped webhook can never permanently lock a workspace out.
Going live (Stripe)
Flipping the dashboard from test mode to live mode is a four-step runbook. The readiness probe catches the paste errors before a customer does.
- In the Stripe dashboard, switch to Live mode. Create the Team and Enterprise products + recurring prices. Live mode is a separate universe from test mode, the price IDs do not carry over. Copy the two
price_live_…IDs. - Update your deployment environment:
STRIPE_SECRET_KEY=sk_live_...
STRIPE_WEBHOOK_SECRET=whsec_... # from the live webhook endpoint
STRIPE_PRICE_ID_TEAM=price_live_...
STRIPE_PRICE_ID_ENTERPRISE=price_live_...
APP_BASE_URL=https://dash.example.com # must NOT be localhost in live mode
- In Stripe → Developers → Webhooks, create a new live-mode endpoint at
https://<your-dash>/stripe/webhook. Subscribe tocheckout.session.completed,customer.subscription.updated,customer.subscription.deleted, andinvoice.paid. Paste the signing secret intoSTRIPE_WEBHOOK_SECRET. - Hit the readiness probe:
$ curl https://<your-dash>/health/billing
A fully-configured live deployment returns {"ok": true, "mode": "live", …} with a 200. Any misconfiguration comes back as 503 with an issues list:
{
"ok": false,
"mode": "live",
"issues": [
"STRIPE_WEBHOOK_SECRET is not set",
"STRIPE_PRICE_ID_TEAM looks wrong, expected 'price_' prefix",
"APP_BASE_URL must not be localhost in live mode"
],
"summary": {
"secret_key_prefix": "sk_live…",
"webhook_secret_set": false,
"supabase_configured": true
}
}
Response sanitises every secret (only prefixes like sk_live… ever leak), so it's safe for an uptime monitor or status page to poll.
Programmatic callers get the same structure via BillingManager.readiness() returning a BillingReadiness dataclass.
from ragcompliance import BillingManager, BillingReadiness
rd: BillingReadiness = BillingManager().readiness()
if not rd.ok:
raise SystemExit("refusing to start: " + ", ".join(rd.issues))
SOC 2 evidence
Most compliance teams can't sign off on a RAG pipeline without a written trail of what was retrieved, what was answered, and proof that the trail hasn't been tampered with. The built-in evidence generator produces a Markdown report mapped to the Trust Services Criteria controls ragcompliance actually has data for: CC6.1 (logical access), CC7.2 (system operation monitoring), CC8.1 (change management), A1.1 (availability), C1.1 (confidentiality).
$ python -m ragcompliance.soc2 \
--workspace acme-prod \
--start 2026-01-01 \
--end 2026-03-31 \
--sample 25 --seed 42 \
--out acme-q1-2026-evidence.md
The report pulls records straight from rag_audit_logs, computes integrity stats (signed vs unsigned, unique sessions, avg latency, models observed), recomputes the SHA-256 signature on a random sample so an auditor can spot-check independently, and renders the control matrix and methodology section. It is not itself a SOC 2 attestation (only a licensed auditor can issue one), but it cuts audit-prep back-and-forth from weeks to minutes.
Sample size and confidence
The evidence report recomputes SHA-256 signatures on a random sample of records from the period. The default is 25 records, suitable for a quarterly compliance spot-check. For deeper due-diligence runs, raise it by passing --sample 100 or higher. Sampling is random but seeded via --seed for reproducibility, so an auditor re-running with the same inputs gets the same sample. A given run may not surface a specific tampered record if the tamper rate is low and the sample size is small; the relationship is the standard hypergeometric one. For exhaustive verification across the full period, pass a sample_size equal to the total record count or loop _verify_signature over every record programmatically.
Programmatic access is the same pipeline without argparse:
from ragcompliance.soc2 import generate_report
md = generate_report(
workspace_id="acme-prod",
start="2026-01-01",
end="2026-03-31",
sample_size=25,
seed=42,
)
Async audit writes
Handler overhead is under 1ms at p50 (~38µs measured in isolation on a clean hot path). End-to-end chain latency depends on your retriever, LLM, and prompt; the handler's contribution is a small constant added on top of whatever your chain does.
Audit writes are fire-and-forget by default. save() enqueues the record onto a bounded in-memory queue and a single daemon worker drains it into Supabase, so the chain's hot path never blocks on audit I/O.
In benchmarks, per-chain audit-write overhead drops from roughly 1.2s (sync Supabase RTT) to well under 1ms (enqueue only), a three to four order of magnitude improvement.
If Supabase is unreachable, records buffer in memory up to RAGCOMPLIANCE_ASYNC_MAX_QUEUE (default 1000) and then drop with a log warning rather than leak memory. On normal process exit an atexit hook drains pending records within RAGCOMPLIANCE_ASYNC_SHUTDOWN_TIMEOUT seconds (default 5). You can also call handler.storage.flush() explicitly in tests or your own shutdown path.
RAGCOMPLIANCE_ASYNC_WRITES=false for tests that inspect storage mid-chain, or for workloads where you'd rather pay the latency than risk any in-flight data loss on a crash.Slack anomaly alerts
Set RAGCOMPLIANCE_SLACK_WEBHOOK_URL to a Slack incoming-webhook URL (or any compatible receiver: Discord, Teams via shim, your own HTTP endpoint) and the handler fires async alerts when a chain looks unhealthy. Four rules, all env-tunable:
| Rule | Fires when | Tuned by |
|---|---|---|
retrieval_returned_zero_chunks | Retriever returned no documents. | n/a |
low_similarity | Best matching chunk scored below threshold. | RAGCOMPLIANCE_SLACK_MIN_SIMILARITY (default 0.3) |
chain_slow | End-to-end latency exceeded threshold. | RAGCOMPLIANCE_SLACK_SLOW_CHAIN_MS (default 10000) |
chain_errored | LangChain or LlamaIndex raised before the chain completed. | n/a |
Alerts post on a separate daemon worker with a bounded queue, so Slack outages can't back-pressure your chain. When the queue fills, alerts drop with a log warning. Set RAGCOMPLIANCE_SLACK_DASHBOARD_URL to include a View in dashboard link in each payload.
Deployment
The dashboard is a single FastAPI app. It's stateless (state lives in Supabase), so any container platform works identically.
Render (fastest)
- Create a new Web Service on render.com, pointing at your repo.
- Build command:
pip install -e ".[supabase,dashboard,llamaindex,sso]" - Start command:
uvicorn ragcompliance.app:app --host 0.0.0.0 --port $PORT - Copy every variable from
.env.exampleinto Render's environment settings. - After the service is live, update the Stripe webhook endpoint to
https://<your-render-url>/stripe/webhook.
Fly.io, Railway, Cloud Run
All three work identically. The app is a stateless container, no volumes, no sidecars. A minimal Dockerfile:
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -e ".[supabase,dashboard,llamaindex,sso]"
CMD ["uvicorn", "ragcompliance.app:app", "--host", "0.0.0.0", "--port", "8000"]
API reference
RAGComplianceHandler
LangChain callback handler. Instantiate once per chat session or once per request, pass via config={"callbacks": [handler]}.
RAGComplianceHandler(
config: RAGComplianceConfig,
session_id: str | None = None,
)
LlamaIndexRAGComplianceHandler
LlamaIndex callback handler. Import from ragcompliance.llamaindex_handler. Attach via Settings.callback_manager = CallbackManager([handler]).
RAGComplianceConfig
Configuration dataclass. The common constructor is RAGComplianceConfig.from_env(), which reads every RAGCOMPLIANCE_* variable.
RAGComplianceConfig(
supabase_url: str | None,
supabase_key: str | None,
workspace_id: str,
dev_mode: bool = False,
enforce_quota: bool = False,
async_writes: bool = True,
async_max_queue: int = 1000,
async_shutdown_timeout: float = 5.0,
)
BillingManager / BillingReadiness
BillingManager(
stripe_secret_key: str | None = None,
stripe_webhook_secret: str | None = None,
supabase_url: str | None = None,
supabase_key: str | None = None,
app_base_url: str | None = None,
)
BillingManager.readiness() -> BillingReadiness
@dataclass
class BillingReadiness:
mode: str # "live" | "test" | "unconfigured"
ok: bool
issues: list[str]
summary: dict[str, Any]
def to_dict(self) -> dict: ...
ragcompliance.soc2.generate_report
generate_report(
workspace_id: str,
start: str, # ISO date
end: str, # ISO date
sample_size: int = 25,
seed: int | None = None,
storage: AuditStorage | None = None,
) -> str
Environment variable reference
| Name | Default | Purpose |
|---|---|---|
RAGCOMPLIANCE_SUPABASE_URL | Supabase project URL. | |
RAGCOMPLIANCE_SUPABASE_KEY | Supabase service role key. | |
RAGCOMPLIANCE_WORKSPACE_ID | Tenant isolation key. | |
RAGCOMPLIANCE_DEV_MODE | false | true = log to stdout; false = persist to Supabase. |
RAGCOMPLIANCE_ENFORCE_QUOTA | false | true raises RuntimeError when over limit. |
RAGCOMPLIANCE_ASYNC_WRITES | true | Fire-and-forget audit writes. |
RAGCOMPLIANCE_ASYNC_MAX_QUEUE | 1000 | Bounded in-memory buffer size. |
RAGCOMPLIANCE_ASYNC_SHUTDOWN_TIMEOUT | 5.0 | Seconds to wait on atexit drain. |
RAGCOMPLIANCE_SLACK_WEBHOOK_URL | Enables Slack alerts when set. | |
RAGCOMPLIANCE_SLACK_MIN_SIMILARITY | 0.3 | Threshold for low_similarity. |
RAGCOMPLIANCE_SLACK_SLOW_CHAIN_MS | 10000 | Threshold for chain_slow. |
RAGCOMPLIANCE_SLACK_DASHBOARD_URL | Appends a "View in dashboard" link to alert payloads. | |
RAGCOMPLIANCE_OIDC_ISSUER | OIDC provider URL (discovery endpoint). | |
RAGCOMPLIANCE_OIDC_CLIENT_ID | OIDC client ID. | |
RAGCOMPLIANCE_OIDC_CLIENT_SECRET | OIDC client secret. | |
RAGCOMPLIANCE_OIDC_REDIRECT_URI | Full callback URL (.../auth/callback). | |
RAGCOMPLIANCE_OIDC_ALLOWED_DOMAINS | Optional comma-separated email domain allowlist. | |
RAGCOMPLIANCE_SESSION_SECRET | Session cookie signing key. | |
STRIPE_SECRET_KEY | sk_live_… or sk_test_…. | |
STRIPE_WEBHOOK_SECRET | Endpoint signing secret from the Stripe webhook config. | |
STRIPE_PRICE_ID_TEAM | Stripe price ID (must start with price_). | |
STRIPE_PRICE_ID_ENTERPRISE | Stripe price ID (must start with price_). | |
APP_BASE_URL | Dashboard base URL. Must not be localhost in live mode. |
License & contact
MIT licensed. Source at github.com/dakshtrehan/ragcompliance. Issues, PRs, and questions welcome.
For enterprise support, private-deployment guidance, or SOC 2 prep help, email daksh.trehan@hotmail.com.