The Problem
CRM data flows from many sources — product apps, the warehouse, manual entry, partner integrations, the marketing automation tool. Changes upstream break downstream consumers without warning. Firefighting dominates the data team’s calendar in the absence of contracts. The pattern is recognizable: a product engineer renames signup_source to signup_channel, the reverse-ETL job into Salesforce silently writes nulls, lead scoring breaks, the SDR team complains that inbound leads stopped arriving, and three days disappear into root-cause analysis. Data contracts are the lightweight commitment that prevents this — a producer agrees to maintain a stable shape, a consumer agrees to read only what the contract promises.
What Data Contracts Specify
A working contract specifies schema (fields, types, nullability, constraints), quality thresholds (completeness percentage, distinct value bounds, freshness), SLA (how quickly data must land after the source event, expected uptime), versioning scheme (semver works, with major versions breaking and minors additive only), the breaking-change notification process (channel, lead time), and the enforcement mechanism (CI tests, a data quality monitor, a manual review).
contract: salesforce_lead_v2
producer: product-signup-service
consumer: rev-ops-reverse-etl
fields:
- email: string, not_null, unique
- signup_source: enum(web, mobile, api, partner)
- created_at: timestamp_utc
sla:
freshness: < 5 minutes
completeness: >= 99.5% on email
versioning: semver; breaking changes require 30-day notice
owner: [email protected]
Implementation
Declare contracts in code (YAML, JSON Schema, dbt model contracts, or Protobuf). CI enforces them on the producer side — if a schema change breaks a contract, the PR fails. Consumers subscribe to a specific version, not “latest”. Breaking changes trigger a notification on a dedicated channel with a migration window (30 days is a humane default), not a silent failure on Friday at 5pm. Tools that support this in 2026: dbt Cloud’s Model Contracts feature, Datafold’s data diff, Atlan’s data contracts module, and SodaCL for the quality side. Pick one; consistency beats brand.
Cultural Shift
Data engineers stop firefighting. Producers accept responsibility for downstream impact rather than treating their schema as private. Consumers stop hoarding workaround views and column-renaming Looker calculations. Contracts normalize the conversation: instead of a Slack DM at 7pm, the conversation is a pull request review three weeks earlier. Tools help (Monte Carlo, Soda, Anomalo, Bigeye for monitoring; Atlan, Collibra for the catalog), but the cultural shift is the bigger win. Adoption is fragile — appoint a data product owner per domain, attach the contract to their OKRs, and run a monthly cross-team review.
Common Failure Modes
The dominant failures: writing the contract once and never updating it, declaring a contract for the easy schema and ignoring the chaotic ones, treating the contract as documentation rather than enforced code, and skipping the consumer-side test (the contract passes producer CI but no consumer ever proved it could read v2 successfully). Another: defining quality thresholds nobody can hit because the source data is inherently messy. Better to write a realistic contract that holds than an aspirational one that gets ignored.
Implementation Sequence
A defensible 90-day rollout: weeks 1-2, identify the top three downstream pain points and trace them to source domains; weeks 3-6, write contracts for those three domains with the producer team in the room; weeks 7-9, wire CI enforcement and a monitoring dashboard; weeks 10-12, run a postmortem on a deliberately staged breaking change to prove the notification flow works.
What to do this week
Pick the single most painful upstream-broke-downstream incident from the last quarter. Write the one-page contract that would have caught it. Walk it to the producer team and ask them to sign. The conversation, more than the document, is the work.