The EU AI Act Article 11 demands technical documentation that traces every input, every transformation, every model decision back to source. Auditors will not accept screenshots. They will not accept “the data team knows.” They will ask for lineage, and lineage either exists in your CRM stack or it doesn’t. The cost of building it now is a fraction of the cost of building it after a conformity assessment fails.
What Article 11 actually requires
For high-risk AI systems — and CRM lead scoring, employment decisions, credit-adjacent profiling, and customer segmentation that affects access to services all qualify — the technical documentation must include:
- A description of the system’s intended purpose, foreseeable misuse, and integrated components.
- Datasets used in training, validation, testing — origin, scope, characteristics, labeling procedures, data cleaning, biases identified.
- Computational resources used to train, validate, test.
- Training methodologies and techniques.
- Risk management measures.
- Pre-determined changes that don’t require re-assessment.
- Performance metrics, accuracy, robustness, cybersecurity.
The dataset section is where most CRM operators will fail. Reproducing “what data flowed into the model and how” three years after the fact requires lineage that was captured at the time, not reconstructed from memory.
Lineage primitives
Three things must be traceable:
- Data lineage — source system → transformation → model input. Field-level, not just table-level.
- Model lineage — code version, training data snapshot, hyperparameters, evaluation results, deployment.
- Decision lineage — for any individual decision, the model version, input features, output, downstream action.
These map roughly to OpenLineage (data), MLflow / Weights & Biases (model), and your agent trace store (decisions).
OpenLineage as the spine
OpenLineage is the de facto open standard for data lineage events. CNCF-adjacent, supported by Airflow, dbt, Spark, Flink, Snowflake, BigQuery, Databricks, and increasingly by CRM connectors (Salesforce Data Cloud emits OpenLineage events for ingestion jobs as of 2025).
{
"eventType": "COMPLETE",
"eventTime": "2026-05-14T09:23:11Z",
"run": { "runId": "9b1e..." },
"job": {
"namespace": "crm.lead_scoring",
"name": "feature_build.weekly"
},
"inputs": [
{
"namespace": "salesforce.prod",
"name": "Lead",
"facets": {
"schema": {
"fields": [
{"name": "Industry", "type": "STRING"},
{"name": "AnnualRevenue", "type": "DOUBLE"}
]
}
}
}
],
"outputs": [
{
"namespace": "feature_store.prod",
"name": "lead_features_v3",
"facets": {
"columnLineage": {
"fields": {
"industry_normalized": {
"inputFields": [
{"namespace": "salesforce.prod", "name": "Lead", "field": "Industry"}
],
"transformations": [{"type": "MAP", "description": "naics_lookup"}]
}
}
}
}
}
]
}
That JSON is what an auditor wants. Generated automatically by the pipeline, stored in Marquez or DataHub or Collibra, queryable backwards from “lead_features_v3 row Y” to “which Salesforce records contributed.”
Field-level vs table-level
Article 11 doesn’t say “field-level” explicitly. It says enough to describe what flowed where. In practice, table-level lineage is insufficient when an auditor asks “did this protected-class proxy feature enter the model.” You need column-level.
Most modern lineage tools (DataHub, Collibra, Atlan, Unity Catalog) capture column lineage when the underlying transformation is SQL or dbt. They lose it when the transformation is opaque (Python UDFs, stored procs, no-code pipelines). That’s where you have to instrument manually.
Mapping lineage to Article 11 sections
A practical crosswalk:
| Article 11 requirement | Lineage artifact |
|---|---|
| Datasets used (origin, scope) | OpenLineage input datasets + source-system metadata |
| Data preparation (cleaning, labeling) | Transformation runs in lineage graph + dbt tests / Great Expectations results |
| Biases identified | Bias-eval reports attached to dataset facets |
| Model architecture, training methodology | MLflow / W&B run metadata, linked from lineage |
| Training compute resources | Pipeline orchestrator run logs |
| Validation / test procedures and metrics | Eval suite results (LangSmith, Promptfoo) linked to model run id |
| Risk management measures | Policy attestations stored alongside dataset versions |
| Logging and post-market monitoring | Agent trace store (OTel GenAI), retained per Article 12 |
If you can answer every row by clicking a link, you can answer the auditor. If any row is “I’ll have to ask the team,” fix it now.
Snapshot semantics matter
Article 11 requires reproducibility. That means dataset snapshots — not “the table as of today” but “the table as it existed when this model version was trained.” Three implementations:
- Time-travel queries (Snowflake, Iceberg, Delta Lake) — cheap, retention-bounded.
- Explicit snapshots to a versioned artifact store — expensive, durable.
- Append-only event log + replay — accurate but slow to reconstruct.
For high-risk CRM AI, explicit snapshots win on legal defensibility even if they cost more storage. You need the bytes the auditor will inspect, not a query that depends on a system that has since changed.
Decision lineage: the per-prediction record
For any individual prediction the system made — a lead score, a churn risk, a credit-adjacent ranking — you must be able to surface:
- Input feature values at decision time.
- Model version and ID.
- Output value and confidence.
- Downstream action taken (or not).
- User affected, timestamp, channel.
This is not the same as lineage. This is per-decision telemetry, retained per Article 12 logging requirements for the full retention period (10 years for high-risk systems).
# decision_record.yaml
decision_id: dec_2026_05_14_8c2f
timestamp: 2026-05-14T11:42:09Z
system: lead_scoring_v3
model_version: model_v3.4.1
model_run_id: mlflow:runs:9b1e34...
feature_snapshot_uri: s3://lineage/snapshots/lead/2026-05-14T11:42:09Z/L-00921.json
input:
lead_id: L-00921
industry_normalized: SOFTWARE
annual_revenue: 12000000
output:
score: 0.83
band: A
downstream_action: routed_to_ae_pool_us_west
human_review: false
explanation_uri: s3://lineage/explain/dec_2026_05_14_8c2f.json
Tooling: the realistic stack
What works in 2026:
- Catalog + lineage UI: DataHub (open source), Collibra (enterprise), Atlan (modern enterprise), Unity Catalog (Databricks-centric).
- Lineage event source: OpenLineage from Airflow / dbt / Spark / your CRM connectors.
- Model lineage: MLflow or Weights & Biases, linked into the catalog.
- Agent / decision lineage: OpenTelemetry GenAI spans + a long-retention store (S3 + Athena, or a managed observability platform).
- Snapshot store: object storage with versioning + immutability + retention policy.
Avoid: rolling your own lineage in a spreadsheet, treating the data dictionary as documentation, and assuming “we have Collibra” means “we have lineage” — Collibra without OpenLineage emitters is just a catalog.
What auditors actually check
Based on public conformity assessment notes and the AI Act notified body guidance:
- Pick a random recent decision. Show the model version, the input features, the source records, and the bias evaluation that was current at training.
- Trace a feature back to source. Show that protected-class proxies were assessed.
- Show the data quality monitoring that runs continuously, not just at training.
- Show how a complaint or correction would flow back through the system.
- Show the change log: what changed, when, who approved, what re-evaluation occurred.
If any step requires a meeting to assemble, you fail.
CRM-specific lineage challenges
The general lineage tooling assumes a data warehouse / lake shape. CRM data has quirks:
- Custom fields. Created at runtime by admins, not by data engineers. Lineage must be schema-aware enough to follow them, or you miss half the model inputs.
- Validation rules and Apex. Salesforce Apex triggers transform data on write. Same with Power Automate flows in Dataverse. These transformations rarely emit lineage events unless explicitly instrumented.
- Sharing rules. The same record looks different to different users. Lineage of “what data was visible to whom at decision time” is rarely captured by default.
- External objects / virtual tables. Data lives outside the CRM but appears as records. Lineage must traverse the boundary.
- Manual data entry. A salesperson types in revenue. There is no upstream “source.” Treat as a first-class lineage origin — capture user, timestamp, UI surface.
Each of these needs explicit handling in the lineage emitter. Don’t assume the catalog vendor’s auto-discovery covers them.
The cost of doing this right
For a mid-size CRM operation:
- Catalog + lineage platform: $80k–$300k / year for enterprise tools; under $50k for self-hosted DataHub.
- Implementation: 6–12 months to instrument the full pipeline; longer if Apex / custom code is opaque.
- Ongoing: 0.5–1 FTE for lineage governance.
- Snapshot storage: usually under $20k / year unless data volume is exceptional.
Compare to the cost of a failed conformity assessment, regulatory fines under Article 99’s penalty tiers, or the reputational cost of an audit finding made public. The lineage investment pays for itself the first time you survive an audit cleanly.
Common gaps
- No column-level lineage through Python or stored proc transformations.
- No snapshot of training data — only “the table” which has since changed.
- No link between model version and the lineage graph.
- No per-decision record retention plan past 90 days.
- No bias eval tied to the dataset version that produced the model.
Each of these is a one-line audit finding and a multi-week remediation.
Bottom line
- Article 11 documentation is a lineage problem, not a policy problem.
- OpenLineage + MLflow + OTel covers the three lineage layers. Pick a catalog to render them.
- Field-level lineage is the audit-grade answer. Table-level fails on protected-class proxy questions.
- Snapshot training datasets explicitly — time-travel queries are not legally defensible at year 7.
- Per-decision records are mandatory, retained for the full retention window, queryable.
- Build this before conformity assessment. Building it during is twice as expensive and four times as slow.