Drop-in callback handlers
LangChain LCEL-safe. Batch- and concurrency-safe as of v0.1.4: share one handler across chain.batch([...]) and concurrent chain.invoke() calls. LlamaIndex via CallbackManager on SYNTHESIZE events.
RAG compliance middleware for LangChain and LlamaIndex. One callback, one signed audit row per chain.
ragcompliance is drop-in middleware for LangChain and LlamaIndex that logs the full chain (query, retrieved chunks with source URLs and similarity scores, the LLM answer, model name, latency), signs it with SHA-256, and writes one row per invocation to your own Supabase, behind row-level security per workspace. No chain rewrites. No black box.
Handler overhead measured in isolation (p50 ≈ 38µs on a clean hot path). Full-chain latency is dominated by your retriever and LLM; the handler adds a small constant on top.
RAG compliance is not a solved problem. Retrieval works, generation works, but the moment a compliance team asks for proof of what the model saw and proof that no one tampered with it since, most RAG stacks have nothing to hand over.
In regulated industries (finance, healthcare, legal, insurance) the question is never "is your retrieval good?" It's "can you prove, for any given answer on any given day, which document was cited, what the model saw, and that no one tampered with it since?" Most RAG stacks can't.
The kinds of questions compliance, internal audit, and GRC teams actually bring to a RAG system in regulated industries:
Left: what your chain normally leaves behind. Right: what ragcompliance writes to your warehouse. No wrappers, no forks, just a callback handler passed through LangChain's config={"callbacks":[handler]} or LlamaIndex's CallbackManager.
Your chain runs. A string comes back. Good luck.
One audited row, chained by signature. (example, mock data)
The handler attaches via the standard callback channel on both frameworks. Your retriever, vector store, prompt, and LLM don't change. One callback, one row.
from langchain_openai import ChatOpenAI from langchain_core.prompts import ChatPromptTemplate from langchain_core.runnables import RunnablePassthrough, RunnableLambda from ragcompliance import RAGComplianceHandler, RAGComplianceConfig retriever = vectorstore.as_retriever(search_kwargs={"k": 4}) prompt = ChatPromptTemplate.from_template("Context:\n{context}\n\nQ: {query}") llm = ChatOpenAI(model="gpt-4o-mini") chain = ( {"context": retriever | RunnableLambda(lambda d: "\n\n".join(c.page_content for c in d)), "query": RunnablePassthrough()} | prompt | llm ) # the only two new lines handler = RAGComplianceHandler(config=RAGComplianceConfig.from_env(), session_id="user-abc") answer = chain.invoke("Does §4.2 cover indemnification?", config={"callbacks": [handler]})
from llama_index.core import Settings from llama_index.core.callbacks import CallbackManager from ragcompliance import RAGComplianceConfig from ragcompliance.llamaindex_handler import LlamaIndexRAGComplianceHandler handler = LlamaIndexRAGComplianceHandler( config=RAGComplianceConfig.from_env(), session_id="user-abc", ) Settings.callback_manager = CallbackManager([handler]) # any query engine now runs under the audit handler response = query_engine.query("Does §4.2 cover indemnification?")
Every feature below exists because a compliance team, an SRE on-call, or a power user asked for it. No vapor. 152 tests green, including a dedicated batch and concurrency suite and a retriever-chunk regression suite for langchain-core >= 1.3.0.
LangChain LCEL-safe. Batch- and concurrency-safe as of v0.1.4: share one handler across chain.batch([...]) and concurrent chain.invoke() calls. LlamaIndex via CallbackManager on SYNTHESIZE events.
Deterministic hash over query + chunks + answer. Any post-hoc tamper on any field is detectable at verification.
Bring your own Supabase. Row-level security per workspace_id. One row per chain, one workspace per tenant.
One CLI produces a Markdown report mapped to CC6.1 / CC7.2 / CC8.1 / A1.1 / C1.1, with a signature-verified random sample an auditor can spot-check.
Four env vars and SSO turns on. Google Workspace, Okta, Auth0, Entra, Authentik, any standards OIDC. Domain allowlist optional.
Self-hostable paid-tier UI for operators who want to run RAGCompliance as an internal product and meter downstream users. Period-rollover via Stripe webhook with a self-healing fallback so a dropped webhook can't lock a workspace out. Not a paid tier of this project.
Fire-and-forget enqueue, bounded in-memory queue, daemon drainer. Per-chain overhead drops from ~1.2s to sub-millisecond.
Four rules: zero chunks, low similarity, slow chain, chain errored. Env-tunable thresholds. Bounded alert queue so Slack outages can't back-pressure you.
/health/billing flags the classic paste errors (prod_… where price_… belongs, missing webhook secret, localhost base URL in live mode) before a customer finds them on Saturday night.
ragcompliance is MIT-licensed middleware. Clone it, pip install it, point it at your Supabase, and every feature on this page is yours. No paid tier, no call-home, no telemetry, no per-seat fees.
If you'd rather not run it yourself, I offer a few kinds of paid help around the project: integration reviews, SOC 2 evidence prep, custom features on contract, and an operated dashboard under your own domain. These are engagements with the author, not a locked tier of the codebase.
RAGCompliance is a focused audit trail library for retrieval-augmented generation. Not a tracing platform, not an eval suite, not a vector database. One job: turn every RAG chain invocation into one signed, row-level-secured audit record your compliance team can actually audit.
Every checkmark below has a commit, a test suite, and a Render deploy behind it. Anything planned is marked as such, with no pretending.
The first things compliance, security, and platform teams ask when they look at ragcompliance for real. Plain answers, no hedging.
No. LangSmith (and Langfuse, Helicone) are observability tools. They help you debug why a chain behaved a certain way. ragcompliance is an audit tool. It produces one signed row per invocation that a compliance reviewer can spot-check a year later and verify the answer hasn't been modified. Different audience (compliance instead of the engineering team), different artefact (immutable signed record instead of traces), different storage model (your Supabase under RLS instead of a SaaS trace store). Most teams in regulated industries need both.
Today the handler writes the query verbatim, chunks verbatim, and the answer verbatim. If any of those can contain PHI or PII and your Supabase project isn't configured for that, do not enable ragcompliance against production traffic yet. PII redaction pre-audit is on the near-term roadmap (opt-in regex + NER pass before the signature is computed). For PHI specifically, pair with a HIPAA-eligible Supabase deployment and your own BAA. Full BYO object storage with customer-held KMS keys is the stricter model and is planned.
It is not a digital signature in the eIDAS / DSA / advanced electronic signature sense. It is a cryptographic integrity tag over query + chunks + answer that lets an auditor detect post-hoc tampering on any of those three fields without needing access to the original model run. Courts and regulators treat it as evidence of integrity, not identity. If you need non-repudiation on top (who wrote the record), pair with signed inserts at the database layer or S3 Object Lock on the raw payload store.
Yes, because Supabase (the default storage) encrypts all rows at rest with AES-256 and all traffic in transit with TLS 1.2+. Row-level security isolates rows per workspace_id so a multi-tenant install cannot cross-read. If your threat model needs customer-held keys, the planned BYO object storage path lets you write the raw payload to S3 / GCS / Azure Blob under your own KMS key and keep only metadata + signature in Supabase.
Supabase is the default because it gives you Postgres, row-level security, and a managed API out of the box, so a self-host path is a single SQL migration plus env vars. The AuditStorage interface is swappable, though. Anything that can persist a dict of the record shape and query it back works; storage.save() is wrapped defensively so a custom backend that raises cannot take the chain down. A Postgres-direct and a BYO object storage backend are on the roadmap.
LangChain and LlamaIndex are the two shipped handlers because they cover most of what teams put in front of auditors today. For anything else, the capture shape is small: build the same AuditRecord at your own boundary (query, retrieved chunks, answer, model, latency), hand it to storage.save(...), and you get the same signed row and the same SOC 2 evidence export. Haystack and DSPy first-class handlers are on the roadmap if there is pull.