Newcontext-mode—Save 98% of your AI coding agent's context windowLearn more
MCP Directory
ServersClientsBlog

context-mode

Save 98% of your AI coding agent's context window. Works with Claude Code, Cursor, Copilot, Codex, and more.

Try context-mode
MCP Directory

Model Context Protocol Directory

MKSF LTD
Suite 8805 5 Brayford Square
London, E1 0SG

MCP Directory

  • About
  • Blog
  • Documentation
  • Contact

Menu

  • Servers
  • Clients

© 2026 model-context-protocol.com

The Model Context Protocol (MCP) is an open standard for AI model communication.
Powered by Mert KoseogluSoftware Forge
  1. Home
  2. Servers
  3. OpenDQV

OpenDQV

GitHub
Website

Open-source, contract-driven data quality validation. Shift-left enforcement at the point of write — before data enters your pipeline.

10
2
<p align="center"> <img src="docs/assets/OpenDQV-Logo-Hires.png" alt="OpenDQV — Open Data Quality Validation" width="480"> </p>

CI
License: MIT
Python 3.11+
PyPI
Docker
Platforms
OpenSSF Scorecard
Coverage
Ruff
OpenSSF Best Practices

QuickstartRulesContractsMCPAPISecurityFAQ

"Trust is easier to build than to repair."
That is why OpenDQV exists. A 422 at the point of write is cheaper than a data incident three weeks later.

Beta (v2.x). Public API surface (REST, contract YAML, MCP tools, Python SDK) is stable. Breaking changes follow a one-release deprecation cycle. Security fixes backported to the latest 2.x line. See API Stability for commitments.

OpenDQV is a write-time data validation service. Source systems call it before writing data. Bad records return a 422 with per-field errors. Good records pass through. No payload is stored.

OpenDQV demo — define a contract, send a bad record (get a 422), fix it (get a 200)

flowchart LR
    subgraph Callers
        direction TB
        SF[Salesforce]
        SAP[SAP]
        DYN[Dynamics]
        ORA[Oracle]
        WEB[Web forms]
        ETL1[ETL pipelines]

        DJ[Django clean]
        PY[Python scripts]
        PD[Pandas / ETL]

        CD[Claude Desktop]
        CUR[Cursor]
        LLM[LLM agents]
    end

    subgraph OpenDQV
        direction TB
        API[Validation API\nREST / batch]
        SDK[LocalValidator\nin-process SDK]
        MCP[MCP Server\nAI-native]
        API & SDK & MCP --> CON[Contracts · YAML\nGovernance · RBAC\nAudit trail]
        API & SDK & MCP --> GEN[Code Generator\nApex · JS · SQL]
    end

    subgraph Results
        direction TB
        R1[valid: true / false]
        R2[per-field errors]
        R3[severity levels]
        R4[webhooks on events]
    end

    SF & SAP & DYN & ORA & WEB & ETL1 --> API
    DJ & PY & PD --> SDK
    CD & CUR & LLM --> MCP

    API & SDK & MCP --> R1

    subgraph Importers
        IMP[dbt schema · GX suites\nSoda checks · ODCS · CSV]
    end
    IMP --> CON

    style API fill:#0d3b5e,stroke:#092a44,color:#fff
    style SDK fill:#0d3b5e,stroke:#092a44,color:#fff
    style MCP fill:#0d3b5e,stroke:#092a44,color:#fff
    style CON fill:#1a8aad,stroke:#14708d,color:#fff
    style GEN fill:#1a8aad,stroke:#14708d,color:#fff
    style R1 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R2 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R3 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style R4 fill:#2ec4e6,stroke:#1a8aad,color:#0d3b5e
    style IMP fill:#1a8aad,stroke:#14708d,color:#fff

A 422 at the point of write closes the feedback loop — producers see failures immediately and fix them at source. Rejection rates drop over time because the tool changes the incentive, not just the outcome.

For post-landing monitoring use Great Expectations, Soda, or dbt tests — they're complementary, not competing. OpenDQV owns layer one (write-time enforcement); those tools own layer three (post-ingestion observability).


AI Agents — first-class via MCP

OpenDQV ships a built-in Model Context Protocol server, so Claude Desktop, Cursor, and any other MCP-compatible agent can discover contracts, validate records, and explain failures through tool calls the agent explicitly declares — no hallucinated compliance, no invented rules.

Watch the 4-minute MCP demo

4-minute demo: Claude Desktop uses two MCP servers — OpenDQV for validation, Marmot for catalog lineage — to check a menu item against ppds_menu_item for Natasha's Law allergen compliance, stating which tool calls it makes and why. (Backup: download the MP4 from the repo)

For tool reference, write guardrails, remote/enterprise mode, and the Marmot composition pattern, see docs/mcp.md.


Install

I have...Command
Python 3.11+git clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && bash install.sh
Dockergit clone https://github.com/OpenDQV/OpenDQV.git && cd OpenDQV && cp .env.example .env && docker compose up -d
Just the SDK/CLIpip install opendqv then opendqv init to bootstrap contracts
None of the aboveBeginner setup guide →

install.sh creates a virtual environment, installs dependencies, and launches the onboarding wizard. Docker pulls ghcr.io/opendqv/opendqv:latest — no build step required.

⚠️ AUTH_MODE=open (the default) has no authentication. Set AUTH_MODE=token and a strong SECRET_KEY in .env before any non-local deployment. See SECURITY.md.


Your First Validation

1. Write a contract — drop a YAML file in contracts/:

contract:
  name: order
  version: "1.0"
  owner: "Data Governance"
  status: active
  rules:
    - name: valid_email
      type: regex
      field: email
      pattern: "^[^@\\s]+@[^@\\s]+\\.[^@\\s]+$"
      severity: error
      error_message: "Invalid email format"
    - name: amount_positive
      type: min
      field: amount
      min: 0.01
      severity: error
      error_message: "Order amount must be positive"
    - name: status_valid
      type: allowed_values
      field: status
      allowed_values: [pending, confirmed, shipped, cancelled]
      severity: error
      error_message: "Invalid order status"

2. Reload contracts:

curl -X POST http://localhost:8000/api/v1/contracts/reload

3. Send a bad record — OpenDQV rejects it:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "not-an-email", "amount": -5, "status": "unknown"}}'
{
  "valid": false,
  "errors": [
    {"field": "email",  "rule": "valid_email",    "message": "Invalid email format",        "severity": "error"},
    {"field": "amount", "rule": "amount_positive", "message": "Order amount must be positive", "severity": "error"},
    {"field": "status", "rule": "status_valid",    "message": "Invalid order status",        "severity": "error"}
  ],
  "contract": "order",
  "version": "1.0"
}

4. Fix the record — it passes:

curl -s -X POST http://localhost:8000/api/v1/validate \
  -H "Content-Type: application/json" \
  -d '{"contract": "order", "record": {"email": "[email protected]", "amount": 49.99, "status": "pending"}}'
{"valid": true, "errors": [], "warnings": [], "contract": "order", "version": "1.0"}

The customer contract ships pre-seeded if you want to skip step 1. The quickstart guide walks through authoring, lifecycle, and batch validation.


Rules

TypeWhat it checks
not_emptyField is present and non-empty
regexField matches (or does not match) a pattern. Built-ins: builtin:email, builtin:uuid, builtin:ipv4, builtin:url
min / max / rangeNumeric bounds
min_length / max_lengthString length
date_formatParseable date/datetime. Falls back through common formats if no explicit format is set
allowed_valuesValue must be in a fixed list
lookupValue must appear in a local file or HTTP endpoint (with TTL cache)
compareCross-field: field op compare_to — supports gt, lt, gte, lte, eq, neq, and today/now sentinels
required_if / forbidden_ifConditional: required or forbidden when another field equals a value
checksumCheck-digit integrity: IBAN, GTIN/GS1, NHS, ISIN, LEI, VIN, CPF, ISRC
uniqueNo duplicates within a batch (batch mode only)
cross_field_rangeValue must be between two other fields in the same record
field_sumSum of named fields must equal a target (within optional tolerance)
geospatial_boundsLat/lon pair within a bounding box
date_diffDifference between two date fields within a range
age_matchDeclared age consistent with date-of-birth field

Rules have severity: error (blocks the record) or severity: warning (flags but allows).
Any rule can include a condition block to apply it only when another field equals a given value.

Full reference: docs/rules/


How it compares

A mature data governance programme operates across three layers, each with a distinct job:

LayerPurposeTools
1. Write-time enforcementPrevent bad data from entering any systemOpenDQV
2. Catalog / governance / stewardshipOwnership, glossary, lineage, policy, stewardship workflowsAlation, Atlan, Collibra, Purview, DataHub, Marmot
3. Pipeline testing / observabilityDetect drift, freshness issues, residual quality after ingestionGreat Expectations, Soda Core, dbt tests, Monte Carlo

OpenDQV Core owns layer one. Your catalog handles layer two, your pipeline tools handle layer three.

Great Expectations / Soda / dbtOpenDQV
WhenAfter data lands (in warehouse/lake)Before data is written (at the door)
WhereData pipelines, batch jobsSource system integration points
ModelScan data at restValidate data in flight
LatencyMinutes to hours (batch)Milliseconds (API call)
Who calls itData engineersApplication developers, CRM admins

They're complementary. Use Great Expectations to monitor your warehouse. Use OpenDQV to stop bad data from getting there in the first place.


Contracts

44 production-ready contracts ship in contracts/ covering GDPR, HIPAA, SOX, MiFID II,
UK Building Safety Act, Martyn's Law, Natasha's Law, Ofcom Online Safety Act, EU DORA,
and 20+ other regulatory frameworks across UK, EU, and US.

See docs/compliance-contracts.md for the full list with
regulatory context, or browse contracts/ directly.
17 minimal starter templates are in examples/starter_contracts/.


Performance

EC2 c6i.large, 2 workers, 12-rule contract, mixed 50/50 workload:
~482 req/s, p99 ~182 ms. Sizing rule: WEB_CONCURRENCY = number of vCPUs.

See docs/benchmark_throughput.md for full platform comparison,
methodology, and monthly volume extrapolation.


Documentation

QuickstartBuild your first contract in 15 minutes
Rules ReferenceAll rule types with parameters and examples
Compliance Contracts44 contracts with regulatory context
API ReferenceREST endpoints, SDK, GraphQL, webhooks
SecurityDeployment checklist, threat model, RBAC
Production DeploymentToken auth, TLS, Docker Compose, hardening
IntegrationsSalesforce, Kafka, Snowflake, dbt, Databricks, MCP, and more
All docs →76 documentation files

API Stability

OpenDQV is in Beta as of 2.0.0. The following stability commitments apply to the v2.x series:

  • REST API endpoints — paths, request bodies, and response shapes are stable within v2.x. Backwards-incompatible changes require a major version bump and follow a deprecation cycle (one minor release of warnings before removal).
  • YAML contract format — the contract schema (rules, fields, types) is stable within v2.x. New rule types may be added; existing rules will not change semantics without a deprecation cycle.
  • Python SDK — OpenDQVClient, AsyncOpenDQVClient, and LocalValidator public method signatures are stable within v2.x. Internal helpers (prefixed _) are not covered.
  • MCP tools — tool names and parameters are stable within v2.x.
  • Security fixes — backported to the latest 2.x line on a best-effort basis.

Contributing

See CONTRIBUTING.md for setup instructions, coding guidelines, and how to submit changes.

License

MIT — see LICENSE.

Acknowledgements

Led by Sunny Sharma, BGMS Consultants Ltd. The vision, the architecture, every contract, and every design decision in this repository are directed by a human who believes data quality is a write-time responsibility.

OpenDQV is built with a hybrid team. Sunny leads — carbon and silicon. Three AI collaborators execute: Claude Sonnet 4.6 (primary developer), Claude Opus 4.6 (strategic auditor), and Grok (market intelligence). All answer to the same ethos: trust is easier to build than to repair.

Repository

OP
OpenDQV

OpenDQV/OpenDQV

Created

March 13, 2026

Updated

April 13, 2026

Language

Python

Category

Communication