HubSpot Import Mapping and Dedup: Pre-Flight Discipline

[object Object]

Every CRM disaster has the same origin story: somebody imported a CSV at 4pm on a Friday because the campaign was launching Monday. The CSV had three contacts named “John Smith” with three different emails and somebody mapped phone to mobile phone, lifecycle stage to lead status, and email to “primary email address” (a custom property nobody owns). By Monday morning, sales is calling the wrong numbers, marketing is sending to bounced addresses, and lifecycle reports are off by 12,000 records.

Imports are a deployment. Treat them like one.

The pre-flight checklist

Five steps. Skip none.

CSV schema audit: every column, every type, every constraint
Mapping plan: every column to a HubSpot property with explicit decisions on collisions
Dedup strategy: which property is the key, in what order
Dry-run on a 100-row sample, manually verify
Roll-back plan documented before you click Import

Step 5 is the one always skipped. There is no native HubSpot “undo import.” You either deleted what you imported, or you live with it.

The mapping plan, not the mapping UI

The HubSpot import UI lets you map column by column at upload time. It is the worst place to think. Plan the mapping in a spreadsheet first.

csv_column          target_property         transform              collision_rule
email               email                   lowercase, trim        skip if conflict
first_name          firstname               title case             overwrite if blank
last_name           lastname                title case             overwrite if blank
company             company                 trim                   overwrite if blank
phone               phone                   E.164 normalize        overwrite if blank
job_title           jobtitle                trim                   overwrite if blank
lead_source         original_lead_source    map to enum            never overwrite
last_engaged        n/a (dropped)           --                     --
notes               n/a (dropped)           --                     --

The collision_rule column is the conversation that prevents the disaster. “Skip if conflict” means do not overwrite existing values. “Overwrite if blank” means write only if the target is empty. “Never overwrite” means existing values are protected.

HubSpot’s import UI has a “do not overwrite” toggle that is global. The spreadsheet is per-column and reflects reality.

The dedup key hierarchy

Email is the default dedup key in HubSpot. It is the right default and the wrong only-choice.

Dedup hierarchy for inbound contact imports:

HubSpot record ID (vid) if present in CSV
Email exact match (case-insensitive)
Email domain + last name (for B2B with shared emails like info@)
Phone (E.164 normalized) + first name
No match: create new

Implement the hierarchy in pre-processing, not at import. The HubSpot importer only does email.

async function classifyRow(row) {
  if (row.vid) return { action: "update", matchId: row.vid };

  const byEmail = await searchByProperty("email", row.email.toLowerCase());
  if (byEmail.length === 1) {
    return { action: "update", matchId: byEmail[0].id };
  }
  if (byEmail.length > 1) {
    return { action: "review", reason: "multiple_email_matches" };
  }

  if (isGenericEmail(row.email)) {
    const candidates = await searchByDomainAndLastName(
      row.email.split("@")[1],
      row.last_name,
    );
    if (candidates.length === 1) {
      return { action: "update", matchId: candidates[0].id };
    }
  }

  if (row.phone) {
    const byPhone = await searchByProperty("phone", normalizePhone(row.phone));
    if (byPhone.length === 1 && byPhone[0].firstname === row.first_name) {
      return { action: "update", matchId: byPhone[0].id };
    }
  }

  return { action: "create" };
}

Output three files:

create.csv: clean creates
update.csv: matched updates with HubSpot vid populated
review.csv: ambiguous, human eyes only

Import create.csv and update.csv separately. Process review.csv row by row.

Phone normalization is non-negotiable

Phones come in as (415) 555-1212, 415.555.1212, +14155551212, 415-555-1212 ext 23. Without normalization, none of them dedupe against each other and your contact records sprout duplicates with different phone formats.

function normalizePhone(raw) {
  if (!raw) return null;
  const digits = raw.replace(/\D/g, "");
  if (digits.length === 10) return `+1${digits}`;
  if (digits.length === 11 && digits[0] === "1") return `+${digits}`;
  if (digits.length >= 11) return `+${digits}`;
  return null; // refuse rather than guess
}

For non-US numbers, the simple normalizer fails. Use libphonenumber if you import multi-region data. Do not pretend.

The 100-row sample

Take the first 100 rows of the CSV. Run them through the full pipeline. Inspect every output record in HubSpot manually.

Look for:

Properties that did not populate (mapping miss)
Properties that overwrote good data (collision rule wrong)
Records that created when they should have updated (dedup miss)
Records that updated when they should have created (false positive dedup)
Lifecycle stage regressions (very common: import sets to “lead” on customers)

If even one of those happens in 100 rows, fix the pipeline before you run 50,000.

The lifecycle stage trap

HubSpot lifecycle stage is non-decreasing by default. If you import “lead” against an existing customer, HubSpot accepts it and the contact regresses to lead. This is the most common silent corruption.

Mitigation: never map lifecycle stage in an inbound list import. Set it explicitly only for net-new contacts via a creation workflow. Existing contacts retain whatever stage they had.

If you must change lifecycle stages via import, do it in a separate, deliberate run with explicit per-record stages, not bulk-set.

Rollback that actually works

Before the import, capture a list of all contact IDs that exist. After the import, the diff is the new records. If something goes wrong:

New records: delete via the diff list
Updated records: restore from the pre-import export of those records, property by property

Yes, you have to do a property-by-property restore. There is no transactional rollback. This is why dry-running matters.

async function preImportSnapshot(idList, properties) {
  const snapshot = [];
  for (const batch of chunks(idList, 100)) {
    const records = await hsClient.crm.contacts.batchApi.read({
      inputs: batch.map((id) => ({ id })),
      properties,
    });
    snapshot.push(...records.results);
  }
  await fs.writeFile(
    `snapshot-${Date.now()}.json`,
    JSON.stringify(snapshot, null, 2),
  );
}

Snapshot only the properties you are about to write. The rollback is mechanical.

Bottom line

Plan mapping in a spreadsheet with explicit collision rules per column; the import UI is for execution, not thinking.
Dedup hierarchy is email, then domain plus last name, then phone plus first name; email-only is not enough.
Normalize phones to E.164 in pre-processing or accept silent duplication.
Run a 100-row sample end to end and inspect manually before the full run.
Snapshot existing records for the properties you will write; that is the rollback.

[object Object]

The pre-flight checklist

The mapping plan, not the mapping UI

The dedup key hierarchy

Phone normalization is non-negotiable

The 100-row sample

The lifecycle stage trap

Rollback that actually works

Bottom line

Get one CRM read per week.

Next articles to explore →

HubSpot Data Quality Command Center: The Rollout That Sticks

HubSpot Marketing Contacts Cleanup Without Tanking Pipeline

Data Quality in HubSpot: A Maintenance Guide

Renaming a HubSpot Association Label: The Quiet Disaster

HubSpot List AND vs OR: The Misread That Tanks Sends

HubSpot Blog Pagination SEO: The Fix That Recovers Lost Traffic