OpenDQV

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.

Quickstart	Rules	Contracts	MCP	API	Security	FAQ

"Trust is easier to build than to repair."
That is why OpenDQV exists. A 422 at the point of write is cheaper than a data incident three weeks later.

Beta (v2.x). Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See API Stability for commitments.

OpenDQV is a write-time data validation service. Source systems call it before writing data. Bad records return a 422 with per-field errors. Good records pass through. No payload is stored.

OpenDQV demo — define a contract, send a bad record (get a 422), fix it (get a 200)

flowchart LR
    subgraph Callers
        direction TB
        SF[Salesforce]
        SAP[SAP]
        DYN[Dynamics]
        ORA[Oracle]
        WEB[Web forms]
        ETL1[ETL pipelines]

        DJ[Django clean]
        PY[Python scripts]
        PD[Pandas / ETL]

        CD[Claude Desktop]
        CUR[Cursor]
        LLM[LLM agents]
    end

    subgraph OpenDQV
        direction TB
        API[Validation API\nREST / batch]
        SDK[LocalValidator\nin-process SDK]
        MCP[MCP Server\nAI-native]
        API & SDK & MCP --> CON[Contracts · YAML\nGovernance · RBAC\nAudit trail]
        API & SDK & MCP --> GEN[Code Generator\nApex · JS · SQL]
    end

    subgraph Results
        direction TB
        R1[valid: true / false]
        R2[per-field errors]
        R3[severity levels]
        R4[webhooks on events]
    end

    SF & SAP & DYN & ORA & WEB & ETL1 --> API
    DJ & PY & PD --> SDK
    CD & CUR & LLM --> MCP

    API & SDK & MCP --> R1

    subgraph Importers
        IMP[dbt schema · GX suites\nSoda checks · ODCS · CSV]
    end
    IMP --> CON

    style API fill:#0d3b5e,stroke:#092a44,color:#fff
    style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
    style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
    style CON fill:#1a8aad,stroke:#14708d,color:#fff
    style GEN fill:#1a8aad,stroke:#14708d,color:#fff
    style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style IMP fill:#1a8aad,stroke:#14708d,color:#fff

A 422 at the point of write closes the feedback loop — producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.

For post-landing monitoring use Great Expectations, Soda, or dbt tests — they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).

AI Agents — first-class via MCP

OpenDQV ships a built-in Model Context Protocol server, so Claude Desktop, Cursor, and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent explicitly declares — no hallucinated compliance, no invented rules.

4-minute demo: Claude Desktop uses two MCP servers — OpenDQV for validation, Marmot for catalog lineage — to check a menu item against ppds_menu_item for Natasha's Law allergen compliance, stating which tool calls it makes and why. (Backup: download the MP4 from the repo)

For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see docs/mcp.md.

Install

I have...	Command
Python 3.11+	`git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh`
Docker	`git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d`
Just the SDK/CLI	`pip install opendqv` then `opendqv init` to bootstrap contracts
None of the above	Beginner setup guide →

install.sh creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls ghcr.io/opendqv/opendqv:latest — no build step required.

⚠️ AUTH_MODE=open (the default) has no authentication. Set AUTH_MODE=token and a strong SECRET_KEY in .env before any non-local deployment. See SECURITY.md.

Your First Validation

1. Write a contract — drop a YAML file in contracts/:

contract:
  name: order
  version: "1.0"
  owner: "Data Governance"
  status: active
  rules:
    - name: valid_email
      type: regex
      field: email
      pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
      severity: error
      error_message: "Invalid email format"
    - name: amount_positive
      type: min
      field: amount
      min: 0.01
      severity: error
      error_message: "Order amount must be positive"
    - name: status_valid
      type: allowed_values
      field: status
      allowed_values: [pending, confirmed, shipped, cancelled]
      severity: error
      error_message: "Invalid order status"

2. Reload contracts:

curl -X POST http://localhost:8000/api/v1/contracts/reload

3. Send a bad record — OpenDQV rejects it:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'

{
  "valid": false,
  "errors": [
    {"field": "email",  "rule": "valid_email",    "message": "Invalid email format",        "severity": "error"},
    {"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
    {"field": "status", "rule": "status_valid",    "message": "Invalid order status",        "severity": "error"}
  ],
  "contract": "order",
  "version": "1.0"
}

4. Fix the record — it passes:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "[email protected]", "amount": 49.99, "status": "pending"}}'

{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}

The customer contract ships pre-seeded if you want to skip step 1. The quickstart guide walks through authoring, lifecycle, and batch validation.

Rules

Type	What it checks
`not_empty`	Field is present and non-empty
`regex`	Field matches (or does not match) a pattern. Built-ins: `builtin:email`, `builtin:uuid`, `builtin:ipv4`, `builtin:url`
`min` / `max` / `range`	Numeric bounds
`min_length` / `max_length`	String length
`date_format`	Parseable date/datetime. Falls back through common formats if no explicit format is set
`allowed_values`	Value must be in a fixed list
`lookup`	Value must appear in a local file or HTTP endpoint (with TTL cache)
`compare`	Cross-field: `field` op `compare_to` — supports `gt`, `lt`, `gte`, `lte`, `eq`, `neq`, and `today`/`now` sentinels
`required_if` / `forbidden_if`	Conditional: required or forbidden when another field equals a value
`checksum`	Check-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC
`unique`	No duplicates within a batch (batch mode only)
`cross_field_range`	Value must be between two other fields in the same record
`field_sum`	Sum of named fields must equal a target (within optional tolerance)
`geospatial_bounds`	Lat/lon pair within a bounding box
`date_diff`	Difference between two date fields within a range
`age_match`	Declared age consistent with date-of-birth field

Rules have severity: error (blocks the record) or severity: warning (flags but allows).
Any rule can include a condition block to apply it only when another field equals a given value.

Full reference: docs/rules/

How it compares

A mature data governance programme operates across three layers, each with a distinct job:

Layer	Purpose	Tools
1. Write-time enforcement	Prevent bad data from entering any system	OpenDQV
2. Catalog / governance / stewardship	Ownership, glossary, lineage, policy, stewardship workflows	Alation, Atlan, Collibra, Purview, DataHub, Marmot
3. Pipeline testing / observability	Detect drift, freshness issues, residual quality after ingestion	Great Expectations, Soda Core, dbt tests, Monte Carlo

OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.

	Great Expectations / Soda / dbt	OpenDQV
When	After data lands (in warehouse/lake)	Before data is written (at the door)
Where	Data pipelines, batch jobs	Source system integration points
Model	Scan data at rest	Validate data in flight
Latency	Minutes to hours (batch)	Milliseconds (API call)
Who calls it	Data engineers	Application developers, CRM admins

They're complementary. Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.

Contracts

44 production-ready contracts ship in contracts/ covering GDPR, HIPAA, SOX, MiFID II,
UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA,
and 20+ other regulatory frameworks across UK, EU, and US.

See docs/compliance-contracts.md for the full list with
regulatory context, or browse contracts/ directly.
17 minimal starter templates are in examples/starter_contracts/.

Performance

EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload:
~482 req/s, p99 ~182 ms. Sizing rule: WEB_CONCURRENCY = number of vCPUs.

See docs/benchmark_throughput.md for full platform comparison,
methodology, and monthly volume extrapolation.

Documentation


Quickstart	Build your first contract in 15 minutes
Rules Reference	All rule types with parameters and examples
Compliance Contracts	44 contracts with regulatory context
API Reference	REST endpoints, SDK, GraphQL, webhooks
Security	Deployment checklist, threat model, RBAC
Production Deployment	Token auth, TLS, Docker Compose, hardening
Integrations	Salesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more
All docs →	76 documentation files

API Stability

OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:

REST API endpoints — paths, request bodies, and response shapes are stable within v2.x. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal).
YAML contract format — the contract schema (rules, fields, types) is stable within v2.x. New rule types may be added; existing rules will not change semantics without a deprecation cycle.
Python SDK — OpenDQVClient, AsyncOpenDQVClient, and LocalValidator public method signatures are stable within v2.x. Internal helpers (prefixed _) are not covered.
MCP tools — tool names and parameters are stable within v2.x.
Security fixes — backported to the latest 2.x line on a best-effort basis.

Contributing

See CONTRIBUTING.md for setup instructions, coding guidelines, and how to submit changes.

License

MIT — see LICENSE.

Acknowledgements

Led by Sunny Sharma, BGMS Consultants Ltd. The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.

OpenDQV is built with a hybrid team. Sunny leads — carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: trust is easier to build than to repair.

Repository

OpenDQV

OpenDQV/OpenDQV

Created

March 13, 2026

Updated

April 13, 2026

Language

Python

OpenDQV

AI Agents — first-class via MCP

Install

Your First Validation

Rules

How it compares

Contracts

Performance

Documentation

API Stability

Contributing

License

Acknowledgements

Repository

Created

Updated

Language

Category