A new VP of Marketing pulls a list of “active customers” and the count is 12 percent higher than finance reported. The discrepancy traces to duplicates, lifecycle stage drift, and contacts marked active that have not opened an email in two years. None of it is malicious; all of it is a data quality program nobody owned. HubSpot makes maintenance easier than it used to, but the cleanup discipline still has to come from a human.
Duplicates and the merge cadence
HubSpot surfaces likely duplicates in Settings > Data quality > Duplicate management. Property matches on email and phone catch most cases. Review at least weekly. Ops Hub adds automated dedup rules for high-confidence matches:
Rule: Merge contacts where
email_lowercase matches AND
created_within 30 days
Action: Auto-merge, keep oldest record
Resist the temptation to auto-merge on phone alone — partner reps often share phone numbers and you will collapse distinct people.
Property standardization
Inconsistent state names (“CA,” “California,” “Calif.”) break grouped reports and routing rules. Phone formats (“+1 415-555-0100” vs “415.555.0100”) break dialer integrations. Company name variants (“Acme Inc,” “Acme, Inc.,” “ACME”) prevent rollups. Ops Hub format automations standardize these on write:
Workflow: Standardize on contact create or update
- Lowercase email
- Format phone E.164
- Title-case first/last name
- Look up state from postal code
- Trim whitespace on company name
Dead contact suppression
Contacts that have not opened or clicked in 18 months cost you deliverability when you keep emailing them. Tag them with a suppression flag, exclude from active campaigns, and set a hard-delete cadence aligned with your retention policy and any GDPR/CCPA obligations:
Active list: ENG_dead_18mo
Filter: Last engagement > 18 months ago
Filter: Not opted out (already excluded)
Workflow: Tag suppression
Trigger: Member of ENG_dead_18mo
Action: Set marketing_suppression = true
Quarterly job: Hard-delete contacts where
marketing_suppression = true AND
has_open_deal = false AND
retention_window_passed = true
Email validation upstream and downstream
Catch bad addresses at form submit with real-time validation (HubSpot forms support a setting; integrate with a validator like Kickbox for higher accuracy). For existing data, run quarterly sweeps to flag invalid addresses and remove them from sends:
// Sweep with validator API
for (const batch of chunk(contacts, 500)) {
const results = await validator.bulk(batch.map(c => c.email));
await hsClient.crm.contacts.batchApi.update({
inputs: results.filter(r => r.status === "undeliverable").map(r => ({
id: r.contactId,
properties: { email_status: "undeliverable" }
}))
});
}
Required field enforcement
A contact missing lifecycle stage breaks lifecycle reporting. Use validation on critical properties at create time and a workflow that flags violations for human review rather than silently dropping records.
Data quality dashboard
What is not measured does not improve. Build one dashboard with:
- Duplicate candidates open
- Contacts missing lifecycle stage
- Contacts missing original source
- Companies with no associated contacts
- Deals with no associated contacts
- Contacts with malformed phone
- Marketing-eligible contacts without consent record
Publish weekly. Assign owners per metric. Improvement appears within a quarter when ownership is clear.
What to do this week
Run the duplicate manager, build the data quality dashboard above, and assign one named owner per metric before the end of the week.