Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.
| Quickstart | Rules | Contracts | MCP | API | Security | FAQ |
|---|
"Trust is easier to build than to repair."
That is why OpenDQV exists. A422at the point of write is cheaper than a data incident three weeks later.
Beta (v2.x). Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See API Stability for commitments.
OpenDQV is a write-time data validation service. Source systems call it before writing data. Bad records return a 422 with per-field errors. Good records pass through. No payload is stored.

flowchart LR
subgraph Callers
direction TB
SF[Salesforce]
SAP[SAP]
DYN[Dynamics]
ORA[Oracle]
WEB[Web forms]
ETL1[ETL pipelines]
DJ[Django clean]
PY[Python scripts]
PD[Pandas / ETL]
CD[Claude Desktop]
CUR[Cursor]
LLM[LLM agents]
end
subgraph OpenDQV
direction TB
API[Validation API\nREST / batch]
SDK[LocalValidator\nin-process SDK]
MCP[MCP Server\nAI-native]
API & SDK & MCP --> CON[Contracts · YAML\nGovernance · RBAC\nAudit trail]
API & SDK & MCP --> GEN[Code Generator\nApex · JS · SQL]
end
subgraph Results
direction TB
R1[valid: true / false]
R2[per-field errors]
R3[severity levels]
R4[webhooks on events]
end
SF & SAP & DYN & ORA & WEB & ETL1 --> API
DJ & PY & PD --> SDK
CD & CUR & LLM --> MCP
API & SDK & MCP --> R1
subgraph Importers
IMP[dbt schema · GX suites\nSoda checks · ODCS · CSV]
end
IMP --> CON
style API fill:#0d3b5e,stroke:#092a44,color:#fff
style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
style CON fill:#1a8aad,stroke:#14708d,color:#fff
style GEN fill:#1a8aad,stroke:#14708d,color:#fff
style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
style IMP fill:#1a8aad,stroke:#14708d,color:#fffA 422 at the point of write closes the feedback loop — producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.
For post-landing monitoring use Great Expectations, Soda, or dbt tests — they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).
OpenDQV ships a built-in Model Context Protocol server, so Claude Desktop, Cursor, and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent explicitly declares — no hallucinated compliance, no invented rules.
4-minute demo: Claude Desktop uses two MCP servers — OpenDQV for validation, Marmot for catalog lineage — to check a menu item against ppds_menu_item for Natasha's Law allergen compliance, stating which tool calls it makes and why. (Backup: download the MP4 from the repo)
For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see docs/mcp.md.
| I have... | Command |
|---|---|
| Python 3.11+ | git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh |
| Docker | git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d |
| Just the SDK/CLI | pip install opendqv then opendqv init to bootstrap contracts |
| None of the above | Beginner setup guide → |
install.sh creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls ghcr.io/opendqv/opendqv:latest — no build step required.
⚠️
AUTH_MODE=open(the default) has no authentication. SetAUTH_MODE=tokenand a strongSECRET_KEYin.envbefore any non-local deployment. See SECURITY.md.
1. Write a contract — drop a YAML file in contracts/:
contract:
name: order
version: "1.0"
owner: "Data Governance"
status: active
rules:
- name: valid_email
type: regex
field: email
pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
severity: error
error_message: "Invalid email format"
- name: amount_positive
type: min
field: amount
min: 0.01
severity: error
error_message: "Order amount must be positive"
- name: status_valid
type: allowed_values
field: status
allowed_values: [pending, confirmed, shipped, cancelled]
severity: error
error_message: "Invalid order status"2. Reload contracts:
curl -X POST http://localhost:8000/api/v1/contracts/reload3. Send a bad record — OpenDQV rejects it:
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'{
"valid": false,
"errors": [
{"field": "email", "rule": "valid_email", "message": "Invalid email format", "severity": "error"},
{"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
{"field": "status", "rule": "status_valid", "message": "Invalid order status", "severity": "error"}
],
"contract": "order",
"version": "1.0"
}4. Fix the record — it passes:
curl -s -X POST http://localhost:8000/api/v1/validate \
-H "Content-Type: application/json" \
-d '{"contract": "order", "record": {"email": "[email protected]", "amount": 49.99, "status": "pending"}}'{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}The customer contract ships pre-seeded if you want to skip step 1. The quickstart guide walks through authoring, lifecycle, and batch validation.
| Type | What it checks |
|---|---|
not_empty | Field is present and non-empty |
regex | Field matches (or does not match) a pattern. Built-ins: builtin:email, builtin:uuid, builtin:ipv4, builtin:url |
min / max / range | Numeric bounds |
min_length / max_length | String length |
date_format | Parseable date/datetime. Falls back through common formats if no explicit format is set |
allowed_values | Value must be in a fixed list |
lookup | Value must appear in a local file or HTTP endpoint (with TTL cache) |
compare | Cross-field: field op compare_to — supports gt, lt, gte, lte, eq, neq, and today/now sentinels |
required_if / forbidden_if | Conditional: required or forbidden when another field equals a value |
checksum | Check-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC |
unique | No duplicates within a batch (batch mode only) |
cross_field_range | Value must be between two other fields in the same record |
field_sum | Sum of named fields must equal a target (within optional tolerance) |
geospatial_bounds | Lat/lon pair within a bounding box |
date_diff | Difference between two date fields within a range |
age_match | Declared age consistent with date-of-birth field |
Rules have severity: error (blocks the record) or severity: warning (flags but allows).
Any rule can include a condition block to apply it only when another field equals a given value.
Full reference: docs/rules/
A mature data governance programme operates across three layers, each with a distinct job:
| Layer | Purpose | Tools |
|---|---|---|
| 1. Write-time enforcement | Prevent bad data from entering any system | OpenDQV |
| 2. Catalog / governance / stewardship | Ownership, glossary, lineage, policy, stewardship workflows | Alation, Atlan, Collibra, Purview, DataHub, Marmot |
| 3. Pipeline testing / observability | Detect drift, freshness issues, residual quality after ingestion | Great Expectations, Soda Core, dbt tests, Monte Carlo |
OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.
| Great Expectations / Soda / dbt | OpenDQV | |
|---|---|---|
| When | After data lands (in warehouse/lake) | Before data is written (at the door) |
| Where | Data pipelines, batch jobs | Source system integration points |
| Model | Scan data at rest | Validate data in flight |
| Latency | Minutes to hours (batch) | Milliseconds (API call) |
| Who calls it | Data engineers | Application developers, CRM admins |
They're complementary. Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.
44 production-ready contracts ship in contracts/ covering GDPR, HIPAA, SOX, MiFID II,
UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA,
and 20+ other regulatory frameworks across UK, EU, and US.
See docs/compliance-contracts.md for the full list with
regulatory context, or browse contracts/ directly.
17 minimal starter templates are in examples/starter_contracts/.
EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload:
~482 req/s, p99 ~182 ms. Sizing rule: WEB_CONCURRENCY = number of vCPUs.
See docs/benchmark_throughput.md for full platform comparison,
methodology, and monthly volume extrapolation.
| Quickstart | Build your first contract in 15 minutes |
| Rules Reference | All rule types with parameters and examples |
| Compliance Contracts | 44 contracts with regulatory context |
| API Reference | REST endpoints, SDK, GraphQL, webhooks |
| Security | Deployment checklist, threat model, RBAC |
| Production Deployment | Token auth, TLS, Docker Compose, hardening |
| Integrations | Salesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more |
| All docs → | 76 documentation files |
OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:
v2.x. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal).v2.x. New rule types may be added; existing rules will not change semantics without a deprecation cycle.OpenDQVClient, AsyncOpenDQVClient, and LocalValidator public method signatures are stable within v2.x. Internal helpers (prefixed _) are not covered.v2.x.See CONTRIBUTING.md for setup instructions, coding guidelines, and how to submit changes.
MIT — see LICENSE.
Led by Sunny Sharma, BGMS Consultants Ltd. The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.
OpenDQV is built with a hybrid team. Sunny leads — carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: trust is easier to build than to repair.
OpenDQV/OpenDQV
March 13, 2026
April 13, 2026
Python