Data Contracts Normalized
Data contracts between producers and consumers are now the baseline rather than the ambition. Schema versioning, quality SLAs, breaking-change processes — the ad-hoc schema-drift chaos of 2022-2024 gave way to discipline. dbt’s Model Contracts, Atlan’s Data Contracts module, and PayPal’s open-sourced data-contract spec all provide working templates. The cultural shift is the harder part: a product engineer renaming a column now requires a 30-day notice and a successful CI run, not a Friday-afternoon merge. Teams that institutionalized this in 2024-2025 cite a 60-80% drop in firefighting hours per the dbt Labs and Acryl Data customer surveys.
Lineage Across Stack
Lineage tools — Atlan, Alation, Collibra, Acryl DataHub, OpenLineage — track data from source through transformations to activation. Impact analysis for schema changes finally answers “if I rename this Snowflake column, what breaks downstream?” before the merge, not after. Compliance queries become tractable: a GDPR Article 15 subject access request can be answered with a query against the lineage graph rather than a week of manual tracing. Lineage is hard to retrofit — most enterprises that started in 2026 wish they had started in 2023 — but doing it now beats doing it next year.
Lineage trace: customer_email
source.product.users.email
-> warehouse.dbt.dim_customer.email_normalized
-> warehouse.dbt.fct_lead_score.email
-> hightouch.salesforce_lead_sync.Email
-> salesforce.Lead.Email
Consumers: forecast_dashboard, marketing_campaign_v3, agentforce_triage_bot
Classification
Every field gets classified — PII, PHI, financial, confidential, internal, public. Classification drives access policy, masking rules in non-production environments, retention schedules, and the agent’s tool definitions. Microsoft Purview, Salesforce Data Detect, BigID, and Immuta lead the data-side classification market in 2026. Classification is foundational for compliance posture; the EU AI Act Article 10 data-governance requirements for high-risk systems explicitly reference data quality and the appropriate handling of categories of personal data, which translates to “you need classification”. The work splits 40/60 between automated detection and manual confirmation; budget for both.
Quality
Monitoring frameworks — Monte Carlo, Soda, Anomalo, Bigeye, Great Expectations, the dbt tests ecosystem — enforce quality at the pipeline level so bad data does not reach CRM. Breakage gets caught at the source rather than discovered downstream when the forecast looks wrong. Common quality checks: row-count drift, null rate change, distinct-value bounds, freshness, referential integrity. The 2026 trend is moving these from “alert when broken” to “block the pipeline run” for tier-1 contracts.
What Changed in 2026
Three shifts: AI agents made data quality a customer-facing problem (a hallucination on bad data costs CSAT, not just analytics), the EU AI Act Article 10 anchored data governance as a regulatory obligation for high-risk systems, and observability vendors merged data and AI lineage into a single graph (Monte Carlo + Pendo, Datadog + LLM Observability, Splunk + the agent telemetry suite). Governance leaders who treated data and AI as separate programs in 2024 are consolidating in 2026.
Common Failure Modes
The recurring failures: classification done once and never refreshed, lineage covering 80% of the warehouse but missing the spreadsheet feed that holds the riskiest data, quality alerts treated as informational rather than blocking, and contracts that exist on paper but have no enforcement teeth. The most expensive failure is the one nobody documents: agents grounding on uncatalogued data sources because the team did not know they existed.
What to do this week
Pick one tier-1 dataset feeding your CRM. Confirm it has a contract, classification on every field, lineage to its downstream consumers, and a quality monitor with a paged owner. If even one of those four is missing, that gap is your most important governance backlog item.