PyPI version CI status Total downloads LangChain + LlamaIndex SHA-256 chain signature Supabase row-level security SOC 2 evidence export OIDC SSO Stripe live-mode readiness probe
Audit trail for retrieval-augmented generation

Proof for every answer
your RAG ever gave.

RAG compliance middleware for LangChain and LlamaIndex. One callback, one signed audit row per chain.

ragcompliance is drop-in middleware for LangChain and LlamaIndex that logs the full chain (query, retrieved chunks with source URLs and similarity scores, the LLM answer, model name, latency), signs it with SHA-256, and writes one row per invocation to your own Supabase, behind row-level security per workspace. No chain rewrites. No black box.

$ pip install "ragcompliance[supabase]"
Read the docs
152tests green
<1mshandler overhead
MITlicensed

Handler overhead measured in isolation (p50 ≈ 38µs on a clean hot path). Full-chain latency is dominated by your retriever and LLM; the handler adds a small constant on top.

rag_audit_logs · example mock data
#0041 Does policy §4.2 cover indemnification?→ contract-v3.pdf · chunk-042 · 0.94 sha256:a3f8c2d1…
#0042 Summarise PSA termination rules.→ psa-master.pdf · chunk-014 · 0.88 sha256:9b4e1d77…
#0043 Tampered post-write→ signature mismatch on recompute sha256:invalid ✗
#0044 What's the liability cap clause?→ msa-2025.pdf · chunk-031 · 0.91 sha256:7e2af055…
#0045 Explain the notice window for breach.→ msa-2025.pdf · chunk-058 · 0.89 sha256:44d7be12…
4 of 5 verified POST /api/logs
Works withLangChain, LlamaIndex
ComplementsLangSmith, Langfuse (observability stays separate from audit)
StorageSupabase (BYO object storage planned)

RAG doesn't fail in the model.
It fails in the audit.

RAG compliance is not a solved problem. Retrieval works, generation works, but the moment a compliance team asks for proof of what the model saw and proof that no one tampered with it since, most RAG stacks have nothing to hand over.

In regulated industries (finance, healthcare, legal, insurance) the question is never "is your retrieval good?" It's "can you prove, for any given answer on any given day, which document was cited, what the model saw, and that no one tampered with it since?" Most RAG stacks can't.

one row
per chain invocation. Signed. Queryable. That's what an auditor actually wants.

The kinds of questions compliance, internal audit, and GRC teams actually bring to a RAG system in regulated industries:

"Show me the retrieval-to-answer chain for every invocation in Q1, and prove it hasn't been modified." the audit-reconstruction ask
"One row per query. Source documents, chunk IDs, the answer verbatim. And row-level security across tenants." the evidence-shape ask
"If your answer changes tomorrow, I want to know exactly what changed. Model, prompt, source, all of it." the answer-drift ask

One row per chain.
Signed, stored, surfaced.

Left: what your chain normally leaves behind. Right: what ragcompliance writes to your warehouse. No wrappers, no forks, just a callback handler passed through LangChain's config={"callbacks":[handler]} or LlamaIndex's CallbackManager.

Without

Your chain runs. A string comes back. Good luck.

› chain.invoke("Does §4.2 cover indemnification?")
› "Yes, section 4.2 obligates Party A to indemnify…"
› ... no source, no chunk, no signature, no row ...
› ... compliance team: no.

With ragcompliance

One audited row, chained by signature. (example, mock data)

{
  "id": "c4e91…",
  "session_id": "user-abc",
  "workspace_id": "acme-prod",
  "query": "Does §4.2 cover indemnification?",
  "retrieved_chunks": [{
    "source_url": "contract-v3.pdf",
    "chunk_id": "chunk-042",
    "similarity_score": 0.94
  }],
  "llm_answer": "Yes, section 4.2…",
  "model_name": "gpt-4o-mini",
  "latency_ms": 1240,
  "chain_signature": "a3f8c2d1…" // sha256(q + chunks + answer)
}

Three lines. No chain rewrites.

The handler attaches via the standard callback channel on both frameworks. Your retriever, vector store, prompt, and LLM don't change. One callback, one row.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from ragcompliance import RAGComplianceHandler, RAGComplianceConfig

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})
prompt    = ChatPromptTemplate.from_template("Context:\n{context}\n\nQ: {query}")
llm       = ChatOpenAI(model="gpt-4o-mini")

chain = (
    {"context": retriever | RunnableLambda(lambda d: "\n\n".join(c.page_content for c in d)),
     "query": RunnablePassthrough()}
    | prompt | llm
)

# the only two new lines
handler = RAGComplianceHandler(config=RAGComplianceConfig.from_env(), session_id="user-abc")
answer  = chain.invoke("Does §4.2 cover indemnification?", config={"callbacks": [handler]})
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
from ragcompliance import RAGComplianceConfig
from ragcompliance.llamaindex_handler import LlamaIndexRAGComplianceHandler

handler = LlamaIndexRAGComplianceHandler(
    config=RAGComplianceConfig.from_env(),
    session_id="user-abc",
)
Settings.callback_manager = CallbackManager([handler])

# any query engine now runs under the audit handler
response = query_engine.query("Does §4.2 cover indemnification?")

What ships today.
All production-hardened. All tested.

Every feature below exists because a compliance team, an SRE on-call, or a power user asked for it. No vapor. 152 tests green, including a dedicated batch and concurrency suite and a retriever-chunk regression suite for langchain-core >= 1.3.0.

01

Drop-in callback handlers

LangChain LCEL-safe. Batch- and concurrency-safe as of v0.1.4: share one handler across chain.batch([...]) and concurrent chain.invoke() calls. LlamaIndex via CallbackManager on SYNTHESIZE events.

02

SHA-256 chain signature

Deterministic hash over query + chunks + answer. Any post-hoc tamper on any field is detectable at verification.

03

Supabase storage with RLS

Bring your own Supabase. Row-level security per workspace_id. One row per chain, one workspace per tenant.

04

SOC 2 evidence generator

One CLI produces a Markdown report mapped to CC6.1 / CC7.2 / CC8.1 / A1.1 / C1.1, with a signature-verified random sample an auditor can spot-check.

05

OIDC SSO for the dashboard

Four env vars and SSO turns on. Google Workspace, Okta, Auth0, Entra, Authentik, any standards OIDC. Domain allowlist optional.

06

Stripe billing + quota (reference implementation)

Self-hostable paid-tier UI for operators who want to run RAGCompliance as an internal product and meter downstream users. Period-rollover via Stripe webhook with a self-healing fallback so a dropped webhook can't lock a workspace out. Not a paid tier of this project.

07

Async audit writes

Fire-and-forget enqueue, bounded in-memory queue, daemon drainer. Per-chain overhead drops from ~1.2s to sub-millisecond.

08

Slack anomaly alerts

Four rules: zero chunks, low similarity, slow chain, chain errored. Env-tunable thresholds. Bounded alert queue so Slack outages can't back-pressure you.

09

Live-mode readiness probe

/health/billing flags the classic paste errors (prod_… where price_… belongs, missing webhook secret, localhost base URL in live mode) before a customer finds them on Saturday night.

Self-host. Free forever.

ragcompliance is MIT-licensed middleware. Clone it, pip install it, point it at your Supabase, and every feature on this page is yours. No paid tier, no call-home, no telemetry, no per-seat fees.

If you'd rather not run it yourself, I offer a few kinds of paid help around the project: integration reviews, SOC 2 evidence prep, custom features on contract, and an operated dashboard under your own domain. These are engagements with the author, not a locked tier of the codebase.

Email the author

A tight scope, shipped in public.

RAGCompliance is a focused audit trail library for retrieval-augmented generation. Not a tracing platform, not an eval suite, not a vector database. One job: turn every RAG chain invocation into one signed, row-level-secured audit record your compliance team can actually audit.

Every checkmark below has a commit, a test suite, and a Render deploy behind it. Anything planned is marked as such, with no pretending.

LangChain callback handler (LCEL-safe, outermost-chain latching)v0.1.0
LlamaIndex callback handler (SYNTHESIZE-based)v0.1.0
Supabase persistence with row-level securityv0.1.0
Dashboard exports: CSV / JSON with filtersv0.1.0
Stripe billing + quota metering (reference implementation, optional)v0.1.1
Period-rollover with self-healing fallbackv0.1.1
Fail-closed quota enforcementv0.1.2
Async audit writes (bounded queue, atexit drain)v0.1.3
Slack alerts for anomalous chainsv0.1.3
SOC 2 evidence report generatorv0.1.3
OIDC SSO on the dashboardv0.1.3
Stripe live-mode readiness probev0.1.3
Batch- and concurrency-safe shared handler (chain.batch, threads, asyncio)v0.1.4
Defensive storage.save() wrapping: a custom backend that raises can't take the chain downv0.1.4
BYO object storage for source documents (S3, GCS, R2)planned
PII redaction policies (email, phone, custom regex) per workspaceplanned

Questions you'd ask before adopting this.

The first things compliance, security, and platform teams ask when they look at ragcompliance for real. Plain answers, no hedging.

Isn't this just LangSmith?

No. LangSmith (and Langfuse, Helicone) are observability tools. They help you debug why a chain behaved a certain way. ragcompliance is an audit tool. It produces one signed row per invocation that a compliance reviewer can spot-check a year later and verify the answer hasn't been modified. Different audience (compliance instead of the engineering team), different artefact (immutable signed record instead of traces), different storage model (your Supabase under RLS instead of a SaaS trace store). Most teams in regulated industries need both.

Does it handle PHI / PII?

Today the handler writes the query verbatim, chunks verbatim, and the answer verbatim. If any of those can contain PHI or PII and your Supabase project isn't configured for that, do not enable ragcompliance against production traffic yet. PII redaction pre-audit is on the near-term roadmap (opt-in regex + NER pass before the signature is computed). For PHI specifically, pair with a HIPAA-eligible Supabase deployment and your own BAA. Full BYO object storage with customer-held KMS keys is the stricter model and is planned.

Is the SHA-256 chain signature a legal signature?

It is not a digital signature in the eIDAS / DSA / advanced electronic signature sense. It is a cryptographic integrity tag over query + chunks + answer that lets an auditor detect post-hoc tampering on any of those three fields without needing access to the original model run. Courts and regulators treat it as evidence of integrity, not identity. If you need non-repudiation on top (who wrote the record), pair with signed inserts at the database layer or S3 Object Lock on the raw payload store.

Is the audit log encrypted at rest?

Yes, because Supabase (the default storage) encrypts all rows at rest with AES-256 and all traffic in transit with TLS 1.2+. Row-level security isolates rows per workspace_id so a multi-tenant install cannot cross-read. If your threat model needs customer-held keys, the planned BYO object storage path lets you write the raw payload to S3 / GCS / Azure Blob under your own KMS key and keep only metadata + signature in Supabase.

Supabase only?

Supabase is the default because it gives you Postgres, row-level security, and a managed API out of the box, so a self-host path is a single SQL migration plus env vars. The AuditStorage interface is swappable, though. Anything that can persist a dict of the record shape and query it back works; storage.save() is wrapped defensively so a custom backend that raises cannot take the chain down. A Postgres-direct and a BYO object storage backend are on the roadmap.

What about Haystack, DSPy, or a raw OpenAI call?

LangChain and LlamaIndex are the two shipped handlers because they cover most of what teams put in front of auditors today. For anything else, the capture shape is small: build the same AuditRecord at your own boundary (query, retrieved chunks, answer, model, latency), hand it to storage.save(...), and you get the same signed row and the same SOC 2 evidence export. Haystack and DSPy first-class handlers are on the roadmap if there is pull.

Ship a RAG system your compliance team will actually sign off on.

$ pip install "ragcompliance[supabase]"
Full documentation Email the author