Every CRM disaster has the same origin story: somebody imported a CSV at 4pm on a Friday because the campaign was launching Monday. The CSV had three contacts named “John Smith” with three different emails and somebody mapped phone to mobile phone, lifecycle stage to lead status, and email to “primary email address” (a custom property nobody owns). By Monday morning, sales is calling the wrong numbers, marketing is sending to bounced addresses, and lifecycle reports are off by 12,000 records.
Imports are a deployment. Treat them like one.
The pre-flight checklist
Five steps. Skip none.
- CSV schema audit: every column, every type, every constraint
- Mapping plan: every column to a HubSpot property with explicit decisions on collisions
- Dedup strategy: which property is the key, in what order
- Dry-run on a 100-row sample, manually verify
- Roll-back plan documented before you click Import
Step 5 is the one always skipped. There is no native HubSpot “undo import.” You either deleted what you imported, or you live with it.
The mapping plan, not the mapping UI
The HubSpot import UI lets you map column by column at upload time. It is the worst place to think. Plan the mapping in a spreadsheet first.
csv_column target_property transform collision_rule
email email lowercase, trim skip if conflict
first_name firstname title case overwrite if blank
last_name lastname title case overwrite if blank
company company trim overwrite if blank
phone phone E.164 normalize overwrite if blank
job_title jobtitle trim overwrite if blank
lead_source original_lead_source map to enum never overwrite
last_engaged n/a (dropped) -- --
notes n/a (dropped) -- --
The collision_rule column is the conversation that prevents the disaster. “Skip if conflict” means do not overwrite existing values. “Overwrite if blank” means write only if the target is empty. “Never overwrite” means existing values are protected.
HubSpot’s import UI has a “do not overwrite” toggle that is global. The spreadsheet is per-column and reflects reality.
The dedup key hierarchy
Email is the default dedup key in HubSpot. It is the right default and the wrong only-choice.
Dedup hierarchy for inbound contact imports:
- HubSpot record ID (
vid) if present in CSV - Email exact match (case-insensitive)
- Email domain + last name (for B2B with shared emails like
info@) - Phone (E.164 normalized) + first name
- No match: create new
Implement the hierarchy in pre-processing, not at import. The HubSpot importer only does email.
async function classifyRow(row) {
if (row.vid) return { action: "update", matchId: row.vid };
const byEmail = await searchByProperty("email", row.email.toLowerCase());
if (byEmail.length === 1) {
return { action: "update", matchId: byEmail[0].id };
}
if (byEmail.length > 1) {
return { action: "review", reason: "multiple_email_matches" };
}
if (isGenericEmail(row.email)) {
const candidates = await searchByDomainAndLastName(
row.email.split("@")[1],
row.last_name,
);
if (candidates.length === 1) {
return { action: "update", matchId: candidates[0].id };
}
}
if (row.phone) {
const byPhone = await searchByProperty("phone", normalizePhone(row.phone));
if (byPhone.length === 1 && byPhone[0].firstname === row.first_name) {
return { action: "update", matchId: byPhone[0].id };
}
}
return { action: "create" };
}
Output three files:
create.csv: clean createsupdate.csv: matched updates with HubSpot vid populatedreview.csv: ambiguous, human eyes only
Import create.csv and update.csv separately. Process review.csv row by row.
Phone normalization is non-negotiable
Phones come in as (415) 555-1212, 415.555.1212, +14155551212, 415-555-1212 ext 23. Without normalization, none of them dedupe against each other and your contact records sprout duplicates with different phone formats.
function normalizePhone(raw) {
if (!raw) return null;
const digits = raw.replace(/\D/g, "");
if (digits.length === 10) return `+1${digits}`;
if (digits.length === 11 && digits[0] === "1") return `+${digits}`;
if (digits.length >= 11) return `+${digits}`;
return null; // refuse rather than guess
}
For non-US numbers, the simple normalizer fails. Use libphonenumber if you import multi-region data. Do not pretend.
The 100-row sample
Take the first 100 rows of the CSV. Run them through the full pipeline. Inspect every output record in HubSpot manually.
Look for:
- Properties that did not populate (mapping miss)
- Properties that overwrote good data (collision rule wrong)
- Records that created when they should have updated (dedup miss)
- Records that updated when they should have created (false positive dedup)
- Lifecycle stage regressions (very common: import sets to “lead” on customers)
If even one of those happens in 100 rows, fix the pipeline before you run 50,000.
The lifecycle stage trap
HubSpot lifecycle stage is non-decreasing by default. If you import “lead” against an existing customer, HubSpot accepts it and the contact regresses to lead. This is the most common silent corruption.
Mitigation: never map lifecycle stage in an inbound list import. Set it explicitly only for net-new contacts via a creation workflow. Existing contacts retain whatever stage they had.
If you must change lifecycle stages via import, do it in a separate, deliberate run with explicit per-record stages, not bulk-set.
Rollback that actually works
Before the import, capture a list of all contact IDs that exist. After the import, the diff is the new records. If something goes wrong:
- New records: delete via the diff list
- Updated records: restore from the pre-import export of those records, property by property
Yes, you have to do a property-by-property restore. There is no transactional rollback. This is why dry-running matters.
async function preImportSnapshot(idList, properties) {
const snapshot = [];
for (const batch of chunks(idList, 100)) {
const records = await hsClient.crm.contacts.batchApi.read({
inputs: batch.map((id) => ({ id })),
properties,
});
snapshot.push(...records.results);
}
await fs.writeFile(
`snapshot-${Date.now()}.json`,
JSON.stringify(snapshot, null, 2),
);
}
Snapshot only the properties you are about to write. The rollback is mechanical.
Bottom line
- Plan mapping in a spreadsheet with explicit collision rules per column; the import UI is for execution, not thinking.
- Dedup hierarchy is email, then domain plus last name, then phone plus first name; email-only is not enough.
- Normalize phones to E.164 in pre-processing or accept silent duplication.
- Run a 100-row sample end to end and inspect manually before the full run.
- Snapshot existing records for the properties you will write; that is the rollback.