What if we want to load more than 5 million records? | CRM Curator

The 5-million-record limit is Data Loader’s per-file cap, not Salesforce’s. The platform itself happily holds billions of rows if you’re paying for the storage. Loading beyond 5M just means choosing a different mechanism.

Options for >5M record loads

1. Split the file and run multiple Data Loader jobs

The simplest workaround: chop the source into 4-million-row chunks and run them sequentially or in parallel through Data Loader. Combine with Bulk API in serial or parallel mode depending on contention.

Useful when:

You’re already comfortable with Data Loader.
The load is a one-off migration.
You have time to babysit the runs.

2. Bulk API 2.0 directly (or via CLI / tools)

Bulk API 2.0 is Salesforce’s modern asynchronous data API. Unlike the original Bulk API, it handles batching server-side — you upload one giant CSV (up to 150 MB compressed) and Salesforce chunks it for you. Multiple jobs per day, much higher throughput.

Tools that speak Bulk API 2.0:

Salesforce CLI (sf data import bulk)
dataloader.io
Workbench (basic support)
Custom scripts using the REST-based Bulk API 2.0 endpoints

This is the right answer for ongoing large loads.

3. ETL / iPaaS platforms

For recurring multi-million-row syncs, a managed ETL platform handles dependency ordering, retries, and scheduling:

MuleSoft Anypoint (Salesforce’s own iPaaS)
Jitterbit, Informatica Cloud, Boomi, Talend
Fivetran / Stitch for warehouse-to-Salesforce reverse ETL
Heroku Connect for Postgres-to-Salesforce sync

4. Big Objects

If “load 50 million records” is the actual ask, ask first: does this data need to live in standard Salesforce storage? Standard records cost real storage and count against governor limits. Big Objects are Salesforce’s archive tier:

Designed for billions of rows.
Queried with async SOQL or Apex.
Loaded with Bulk API 2.0 or a CSV-based async load.
Don’t support all standard features (no triggers, no flows, limited reporting).

Big Objects are right for audit logs, historical events, IoT/sensor data — read-mostly, append-heavy datasets.

5. External Objects (Salesforce Connect)

If the data lives in another system (Snowflake, Postgres, SQL Server, Oracle), don’t load it. Surface it through Salesforce Connect as External Objects. Records aren’t stored in Salesforce — they’re queried live via OData or a custom adapter. Costs an add-on licence but eliminates the load problem entirely.

Pre-load checklist for very large loads

Defer / disable rollup summaries during the load (where you can) and recalculate after.
Skinny indexes / custom indexes — request from Salesforce support for fields used in WHERE clauses on the loaded object.
Custom permission bypasses in triggers/flows for the integration user.
Bulk API 2.0 + parallel for raw speed; serial mode if rollup contention hurts.
Off-hours runs to avoid contending with users.
Storage check — confirm the org has enough data storage to hold the result. 5M records of a 2 KB object is ~10 GB.

What an interviewer wants to hear

In one breath: “5M is Data Loader’s GUI cap, not a Salesforce hard limit. For larger loads I’d switch to Bulk API 2.0 through the Salesforce CLI or a tool like dataloader.io. For archive-volume data I’d consider Big Objects or Salesforce Connect so I’m not paying for standard storage.”

Verified against: Bulk API 2.0 Developer Guide, Data Loader Guide, Metadata API Developer Guide. Last reviewed 2026-05-17 for Spring ‘26 release.