If accented letters (é, ñ, ü), CJK characters, emoji, or smart quotes turn into garbage like é or ? after a load, you have an encoding problem. Salesforce stores strings as UTF-8, but Data Loader has to be told the source file is UTF-8 too — otherwise it reads bytes assuming Windows-1252 (the legacy default) and ships the wrong code points.
The fix in three steps
- Save the CSV as UTF-8. In Excel: Save As → CSV UTF-8 (Comma delimited) (.csv). In Google Sheets: download as CSV (it’s always UTF-8). In Notepad: Save As → Encoding: UTF-8. In VS Code: bottom-right encoding indicator → reopen / save with encoding UTF-8.
- Open Data Loader → Settings → Settings. Set:
- Read UTF-8 encoding → checked
- Write UTF-8 encoding → checked (so success/error files come back correctly too)
- Re-run the job. The special characters now flow through intact.
Why this matters
Excel on Windows often saves CSVs in Windows-1252 (CP1252) by default. That encoding handles Western European characters reasonably but mangles everything else. Without the UTF-8 toggle, Data Loader passes the bytes to the API unchanged, and Salesforce interprets the raw bytes as UTF-8 — which gives you mojibake.
Always insist on UTF-8 in, UTF-8 out.
Quick diagnostic
If you load José and Salesforce shows José, you wrote UTF-8 bytes but Data Loader read them as Windows-1252 (or vice versa). Tick the UTF-8 read setting and reload.
If you see literal question marks (?) the data was downgraded somewhere — probably saved from Excel as plain CSV without the UTF-8 variant. Re-save the source.
Special character checklist
- Accented Latin (
é,ñ,ü,Ç) — UTF-8 toggle handles it. - CJK (Chinese, Japanese, Korean) — UTF-8 is mandatory; ensure your CSV editor isn’t silently transliterating.
- Right-to-left scripts (Arabic, Hebrew) — UTF-8 handles the code points; rendering depends on the field’s display, not the data.
- Emoji and supplementary plane characters (😀, 𝓗) — Salesforce supports these as long as the field is configured for Unicode (most are). Use UTF-8.
- Smart quotes / em-dashes (
",',—) — Word and some web tools convert ASCII punctuation. Either accept them (UTF-8 carries them fine) or strip them in pre-processing.
Tips for production loads
- Byte Order Mark (BOM) at the start of a UTF-8 file is harmless to Data Loader but breaks some downstream tools — strip it if your validators complain.
- Excel re-opens UTF-8 CSVs in CP1252 unless you explicitly Import from Text. Use a CSV-aware viewer (VS Code, Sublime, Notepad++) to verify before loading.
- Long unicode strings still count against text-field byte limits, not just character limits — multi-byte characters use more storage.
Verified against: Data Loader Guide — Configuring, Metadata API Developer Guide. Last reviewed 2026-05-17 for Spring ‘26 release.