Data Mask for Sandboxes: Configure, Validate, When to Skip

[object Object]

Most teams treat Salesforce Data Mask as the answer to “we have PII in our Full sandbox.” It is part of the answer. The other parts — what to mask, how to validate, when to skip — are what separates a useful sandbox from a compliance theater.

What Data Mask actually does

Data Mask replaces sensitive field values with realistic-looking fake values after a sandbox refresh. It runs as a managed package. You define a configuration set per object, choose mask types per field (random name, deterministic hash, pattern), and schedule it to fire post-refresh.

What it does not do:

Mask data inside long text or rich text fields you didn’t explicitly call out.
Mask data in attachments or files.
Mask data inside JSON blob fields.
Touch data that has already been used in test runs (mask runs once post-refresh).

Configure: the fields you must mask

Start with regulatory baseline. Mask these in every Full sandbox:

Personal identifiers: email, phone, mobile, full name on Contact/Lead/Person Account.
Financial: any field tagged with __financialPii or that contains credit card, bank account, tax ID.
Health (if you handle PHI): diagnosis, treatment, provider notes.
Free-text fields known to contain PII: case description, case comments, chatter posts.

For each, pick the mask type carefully:

Random Name — for FirstName / LastName.
Email Pattern — preserves domain shape; useful for testing email routing.
Phone Pattern — preserves country code; tests phone integration paths.
Deterministic Hash — same input always produces same output. Use for external IDs you need to remain joinable.
Custom Pattern — regex-based; for tax IDs, customer numbers.

Configure: the fields people forget

In our last audit cycle, these were missed in 80% of orgs:

Description and Long Description fields on Lead and Opportunity.
Custom long-text fields like Notes__c or MeetingSummary__c.
Files and attachments — Data Mask does not touch these. You have to delete or replace post-refresh.
Chatter feeds — Data Mask has a Chatter module that you must enable separately.
Custom fields added in the last six months that nobody added to the mask config.

Run a quarterly “field gap” check. Pull all custom fields of type text/long-text added since the last config update and confirm each is intentionally masked or intentionally allow-listed.

sf data query --query "
  SELECT TableEnumOrId, DeveloperName, DataType,
         CreatedDate, LastModifiedDate
  FROM CustomField
  WHERE DataType IN ('Text', 'LongTextArea', 'Html', 'TextArea')
  AND CreatedDate > LAST_N_DAYS:90
" --target-org prod --result-format csv > new-text-fields.csv

Diff against your mask config. Any field in the result without a mask entry needs an explicit decision.

Validate: the post-refresh smoke test

After every refresh + mask run, an automated test must validate. Without it you’re hoping.

@IsTest
public class DataMaskValidationTest {

  @IsTest
  static void noRealEmailsLeak() {
    List<Contact> sample = [
      SELECT Email FROM Contact
      WHERE Email != null
      ORDER BY CreatedDate DESC
      LIMIT 1000
    ];
    Pattern realDomainPattern = Pattern.compile(
      '@(yourcompany\\.com|partner1\\.com|partner2\\.com)'
    );
    for (Contact c : sample) {
      Matcher m = realDomainPattern.matcher(c.Email);
      System.assert(!m.find(),
        'Real domain leaked through mask: ' + c.Email);
    }
  }

  @IsTest
  static void deterministicHashesAreStable() {
    List<Account> sample = [
      SELECT ExternalId__c FROM Account
      WHERE ExternalId__c != null
      LIMIT 100
    ];
    Map<String, Integer> counts = new Map<String, Integer>();
    for (Account a : sample) {
      Integer c = counts.containsKey(a.ExternalId__c)
        ? counts.get(a.ExternalId__c) + 1
        : 1;
      counts.put(a.ExternalId__c, c);
    }
    // Sanity: hashed values shouldn't all collapse to one value
    System.assert(counts.size() > 10,
      'Hash mask is producing too few unique values');
  }
}

Run this as part of your refresh runbook. Failure halts sandbox availability.

When to skip Data Mask

Not every sandbox needs it. Use straight refresh (no mask) in narrow, controlled cases:

Developer Pro / Partial sandboxes that contain only seed data, not prod export.
Sandboxes used solely by employees with prod data access anyway (some staff already see real customer data). The mask adds friction without adding protection.
Performance test sandboxes where mask would distort statistical distributions and produce bogus benchmarks.

In every other case, mask. The cost is real (Data Mask credits + 1-4 hour mask window). The alternative cost — a developer’s laptop with a sandbox export of production PII — is worse.

Refresh and mask sequencing

The standard runbook for a Full sandbox refresh:

Schedule the refresh in a maintenance window (off business hours).
As soon as refresh completes, auto-trigger Data Mask via Apex scheduled job.
While mask runs, the sandbox is in restricted mode (admins only) — enforced by a login flow that checks Profile.UserType and gates non-admin login.
On mask completion, run the validation test class above.
On test success, lift the restriction.
Notify users.

The combined window is 4-8 hours for most enterprise Full sandboxes. Plan accordingly.

See the sandbox refresh post-clone checklist for the broader runbook.

Cost note

Data Mask is billed per sandbox per refresh. For orgs that refresh Full sandboxes weekly, this is a real line item. If you do not need a weekly refresh, don’t. Bi-weekly with a careful seeded data overlay between refreshes is often a better cost-to-fidelity tradeoff.

UX note

On every non-production org, render a persistent banner: “SANDBOX — Data is masked. Do not use for real customer outreach.” The banner is a login-flow injected component. Doesn’t trigger if IsSandbox = false. Saves more PR disasters than you’d guess.

Bottom line

Mask config is a living artifact; review it quarterly against new custom fields.
Long-text and custom text fields are where most leaks survive — audit them explicitly.
Always validate mask completion with an automated test; never trust the manifest alone.
Skip mask only when no PII is present or all sandbox users already have prod data access.
The combined refresh + mask + validate window is 4-8 hours — schedule like a deploy.

[object Object]

What Data Mask actually does

Configure: the fields you must mask

Configure: the fields people forget

Validate: the post-refresh smoke test

When to skip Data Mask

Refresh and mask sequencing

Cost note

UX note

Bottom line

Get one CRM read per week.

Next articles to explore →

Sandbox Refresh: The Post-Clone Checklist Everyone Forgets

Salesforce Sandbox Types: Pick the Right One

The Apex == Trap That Breaks Map and Set Lookups

The 1,500-Char Formula Limit Is a Design Signal

Queueable Finalizers: The Idempotent Async Apex Pattern

Bulk API vs REST vs SOAP: The Volume Decision Tree