Upgrades are not a checklist. They are a rehearsal — performed on a sub-prod stage that genuinely resembles production, scored against criteria written before the curtain rises. Teams that treat upgrades as a checklist discover their regressions during prod cutover. Teams that rehearse find them in dress rehearsal, with time to fix.
The four-stage pattern
A working upgrade rehearsal program uses four sub-prod stages, each with a defined purpose. More stages add coordination cost; fewer stages let critical defects through.
- Smoke — a freshly cloned dev. Verify the upgrade applies cleanly and the platform boots. Hours, not days.
- Functional — a dev or test instance with current update sets applied. Verify in-house customization survives.
- Integration — a UAT-grade instance with live integration endpoints (or near-live mocks). Verify the boundary contracts hold.
- Performance — a clone of production sized to match real load. Verify the perf characteristics did not regress.
You can collapse 1 and 2 if your dev instance is reliable. You cannot collapse 3 and 4; integration regressions and performance regressions have different shapes and different fixes.
The clone discipline
The Performance stage is only useful if it is a faithful clone of production. Clone discipline matters: certain tables must be preserved across the clone, others must be excluded for size or sensitivity. Get this wrong and you are rehearsing on a stage that does not match the actual production state, which means your rehearsal results lie to you.
For preserved-table rules and the conventions to enforce, see our instance clone data preservation rules.
Pre-upgrade snapshot
Before the rehearsal upgrade applies, capture the baseline. This is the only way to claim a regression is in fact caused by the upgrade and not by pre-existing weirdness.
// Background script — baseline capture
var report = {
timestamp: new GlideDateTime().getValue(),
version: gs.getProperty('glide.buildtag'),
metrics: {}
};
// Customization fingerprint
var ga = new GlideAggregate('sys_update_xml');
ga.addAggregate('COUNT');
ga.groupBy('type');
ga.query();
report.metrics.update_xml_by_type = {};
while (ga.next()) {
report.metrics.update_xml_by_type[ga.type.toString()] = parseInt(ga.getAggregate('COUNT'));
}
// Active scheduled jobs
var sj = new GlideRecord('sysauto_script');
sj.addQuery('active', true);
sj.query();
report.metrics.active_scheduled_scripts = sj.getRowCount();
// Active business rules by table
var br = new GlideAggregate('sys_script');
br.addQuery('active', true);
br.addAggregate('COUNT');
br.groupBy('collection');
br.query();
report.metrics.active_brs_by_table = {};
while (br.next()) {
report.metrics.active_brs_by_table[br.collection.toString()] = parseInt(br.getAggregate('COUNT'));
}
gs.info('BASELINE: ' + JSON.stringify(report));
Run this on every stage, before and after the upgrade. Diff the JSON. Anything that changed without an obvious reason gets investigated.
The skipped-record review is the upgrade
When the upgrade applies, the platform produces a list of skipped customizations — places where your modifications collided with platform changes and the platform chose to preserve your customization rather than merge. This list is where 80 percent of upgrade defects live.
Three rules for the skipped review:
- Every skipped record gets a verdict: keep customization, revert to platform, manual merge, or delete customization. No “decide later.”
- The verdict is recorded with author and date in a tracking table, not just in the skip-record itself. You will need this audit trail.
- The merge is performed in a separate update set named with the upgrade target version, so the merged work travels with the upgrade.
The skipped-review queue is the long pole of any real upgrade. Plan for 2 to 4 engineer-weeks for an average mid-sized tenant, more for heavily customized ones.
ATF coverage gate
The Automated Test Framework exists for this. The discipline:
- Build ATF coverage for the workflows you most need to survive: top 20 catalog items, top 20 incident flows, the major integrations’ happy paths.
- Run the full ATF suite on Functional and on Integration stages, before and after the upgrade.
- Treat any test that passed before and fails after as a release-blocking regression.
Most teams underinvest in ATF until an upgrade hurts. The math is straightforward: a four-hour test suite that catches one critical regression has paid for two years of authoring effort.
For the prioritization question — which tests to write first when you cannot write them all — see our ATF test coverage prioritization.
Integration verification
Integrations break on upgrade in surprising places. The contract may not have changed but the wire-level details often do — header casing, default timeouts, deprecation of legacy auth schemes, subtle changes to date formatting in REST payloads.
The integration-stage protocol:
- Identify every outbound integration. For each, write a one-page contract describing the endpoint, the auth, the payload, and the expected response shape.
- Run the integration against the live partner endpoint (or a high-fidelity mock) on the upgraded stage.
- Diff the request and response bytes against a baseline captured before the upgrade.
- Investigate every diff, even cosmetic ones. A cosmetic diff today is a parser breakage tomorrow.
This is tedious and unavoidable. The teams that skip it are the teams whose Tuesday morning after cutover starts with the finance integration failing silently.
Performance verification
Performance regressions are usually not catastrophic at first — they are 8 percent slower on the home page, 12 percent more memory on a worker, a missed index on a new internal table. Individually invisible. Cumulatively damaging.
The performance stage protocol:
- Define 10 to 15 critical user transactions. Each gets a name and a transaction trace.
- Run a synthetic load that approximates real concurrent users — not a stress test, just a realistic load.
- Capture mean and p95 latency for each transaction, before and after.
- A 10 percent regression on any p95 metric is a release-blocking finding until investigated.
The “investigated” verdict can still result in shipping — sometimes the regression is real, expected, and acceptable for a feature you wanted. The discipline is to know, not to pretend it did not happen.
Cutover day
The production cutover is the boring part. If the rehearsal was honest, the cutover is mostly waiting. The runbook:
T-7 days: All open update sets frozen. Only upgrade-related changes accepted.
T-2 days: Final clone of prod to Performance stage. Confirm baseline metrics match prod.
T-1 day: Communications go out. Banner on portal. Stand down non-essential changes.
T-0: Maintenance window opens. Upgrade applies. Smoke tests run from runbook.
T+1 hour: ATF suite runs. Critical-path manual checks performed.
T+4 hours: Integrations verified end-to-end.
T+24 hours: Performance dashboard reviewed against baseline.
T+7 days: Post-upgrade review, with skipped-record audit and lessons learned.
The T+7 review is the part most teams skip. Skip it and you will repeat every mistake on the next upgrade.
What you should not rehearse
You will be tempted to rehearse the Now Assist regeneration of search indexes, the rebuild of the analytics caches, and the dozen other “soft” operations that happen post-upgrade. Do not. They are time-dependent and will behave differently on the production data volume than on a clone, no matter how faithful. Verify in production with a tight watchlist instead.
UI on cutover night
Pin a one-page dashboard for the cutover lead: ATF pass rate, integration green/red, top 5 platform errors per minute, instance node health, current logged-in users. Anything that requires more than one click to find is not on the dashboard. The cutover lead’s attention is the scarcest resource of the night; design for it.
Plugin and store app considerations
Plugins and Now Store apps follow their own upgrade cadence and rarely sync perfectly with the platform release. The rehearsal must cover the version combinations actually deployed in production. The frequent surprise: a store app worked on the previous platform release but throws a parser warning on the new one because a deprecated API path is finally enforced.
For each plugin or store app:
- Confirm the version is supported on the target platform release.
- Test the plugin’s flows and UI in the Functional stage.
- Check the plugin’s change log for any “breaking changes” notes.
- If the plugin owner has not certified the new platform release, raise a flag and consider deferring the upgrade.
Plugin compatibility is the single most common reason upgrades get delayed. Get the inventory right before you commit to a date.
The skip-level question
Some teams consider skipping a release to save effort — going from N to N+2 instead of stopping at N+1. The platform supports it within limits, but the rehearsal cost grows non-linearly. Each skipped release adds skip-record volume, plugin compatibility checks, and deprecation flags to chase.
For most tenants, the right answer is to stay current with one release per cycle. Skip-level upgrades are appropriate only when the intermediate release contained no features you wanted, the upgrade window has slipped, and you can absorb the higher rehearsal cost in a single push.
Tradeoffs to be honest about
A four-stage rehearsal is expensive. It costs sub-prod instance time, engineering attention, and weeks of calendar. The alternative is faster, cheaper, and routinely produces a Monday-after-cutover crisis that costs more than the rehearsal would have. The math is in favor of the rehearsal for any tenant beyond toy size.
Do not over-rehearse. Two full dress rehearsals is enough; the third rarely finds anything the second did not, and the calendar cost grows fast.
Key takeaways
- Four stages — Smoke, Functional, Integration, Performance — with distinct purposes. Do not collapse Integration and Performance.
- Baseline capture before the upgrade is what makes “regression” a measurable claim instead of a feeling.
- The skipped-record review is the upgrade. Plan 2 to 4 engineer-weeks for mid-sized tenants.
- ATF coverage on top workflows pays back its authoring cost the first time it catches a regression.
- The T+7 post-upgrade review is the meeting that prevents the next upgrade from being worse than this one.