Credential Vault Rotation: A Zero-Downtime Pattern That Actually Works

[object Object]

Credentials need to rotate. Rotating them without a maintenance window is harder than it should be, because most integration patterns assume a single credential per connection and treat rotation as a cutover event. The pattern that actually works in production uses a brief dual-credential overlap window and a clean rollback path.

The naive approach and what it costs

The naive rotation is: generate a new credential, update the vault record, the integration starts using the new one. The cost: every in-flight call between the moment the partner accepts the new credential and the moment the platform fully propagates the change can fail with auth errors. For a high-volume integration, “every in-flight call for 30 seconds” can be hundreds of failures.

In practice teams handle this with a maintenance window, which is operationally cheap for low-volume integrations and politically expensive for high-volume ones. The dual-credential pattern eliminates the window.

The dual-credential overlap

The pattern in five steps:

Provision a second credential on the partner side. Now both old and new credentials are accepted.
Add the new credential to the vault alongside the old one, with a rollover_state flag.
Configure the integration to prefer the new credential but fall back to the old on auth failure.
Wait for one full rotation cycle (the integration’s longest cache or session timeout, plus a safety margin).
Deactivate the old credential on the partner side. Remove it from the vault.

The “fall back to the old on auth failure” step is the magic. During the overlap, any caller using a stale cached credential fails over gracefully instead of erroring.

The vault model for overlap

The vault record needs to express “active” and “pending” simultaneously:

Table: u_integration_credential
  u_name              (String)
  u_partner_endpoint  (Reference)
  u_active_secret     (Encrypted)
  u_pending_secret    (Encrypted, nullable)
  u_state             (Choice: stable, rolling_forward, rolling_back)
  u_rollover_started  (Datetime)
  u_rollover_expires  (Datetime)

u_active_secret is the primary. u_pending_secret is the rollover candidate. During stable state, pending is null. During rolling_forward, both are set; the integration uses pending and falls back to active. After the cycle, pending is promoted to active and the field is cleared.

The auth flow with fallback

// Cred resolver used by the integration's REST wrapper
var CredentialResolver = Class.create();
CredentialResolver.prototype = {
    initialize: function() {},

    resolve: function(credName) {
        var c = new GlideRecord('u_integration_credential');
        c.addQuery('u_name', credName);
        c.query();
        if (!c.next()) return null;
        return {
            primary: c.u_state == 'rolling_forward' ? c.u_pending_secret : c.u_active_secret,
            fallback: c.u_state == 'rolling_forward' ? c.u_active_secret : null
        };
    },

    type: 'CredentialResolver'
};

// Integration wrapper with fallback
function callPartner(endpoint, payload) {
    var creds = new CredentialResolver().resolve('partner-x');
    var result = makeRequest(endpoint, payload, creds.primary);
    if (result.status == 401 && creds.fallback) {
        gs.warn('Auth failed with primary, retrying with fallback');
        result = makeRequest(endpoint, payload, creds.fallback);
        if (result.status < 400) {
            recordFallbackUse('partner-x');
        }
    }
    return result;
}

The recordFallbackUse call writes to a counter table. Why: during a rotation, fallback use should taper from non-trivial to zero over the cycle. If it stays non-zero, something is still caching the old credential and needs investigation.

The rotation calendar

A working rotation cadence for most integrations:

Service accounts and machine identities: every 90 days
API keys and OAuth client secrets: every 180 days
Long-lived shared secrets (legacy partners): every 365 days, with quarterly review for the ones that should be retired entirely

The cadence is per-integration, not per-tenant. Schedule the rotations in a calendar so you do not end up rotating five integrations on the same week and have nothing to fall back on if one goes sideways.

The rollback path

The pattern’s killer feature is that rollback is trivial. If the rolling_forward step produces unexpected behavior:

Set u_state to rolling_back.
The resolver returns active as primary and pending as fallback (swap).
The integration uses the old credential as primary; new auth failures fall back to the candidate.
Investigate the cause. When clear, either retry the rollover or remove the candidate entirely.

Rollback takes seconds, requires no partner-side action, and produces no user-visible failures. The whole point of the dual-credential model is that you never have to commit to a one-way door.

Audit trail discipline

Auditors want to see, for every credential, when it was rotated, by whom, and that the old credential is genuinely no longer accepted. The audit trail to maintain:

Table: u_credential_audit_event
  u_credential        (Reference)
  u_event_type        (Choice: provisioned, rolled_forward, rolled_back, decommissioned)
  u_performed_by      (Reference to user)
  u_performed_at      (Datetime)
  u_evidence          (String — partner-side confirmation token or ticket)
  u_notes             (String)

The u_evidence field is the one auditors care about most. A claim that “we decommissioned the old credential” without proof from the partner side is not an audit trail. Capture the partner’s confirmation — an email screenshot, a ticket number, a partner-portal action log — and link it.

What to do when the partner cannot dual-credential

Some partners do not support overlapping credentials. They have one slot per integration, and “rotate” means “delete and replace.” For these partners, the maintenance window is unavoidable. Minimize it:

Pre-stage the new credential value on your side, marked as pending but not yet activated.
Coordinate with the partner for the swap moment. Two-minute window is typical.
At T-0, the partner activates the new credential and the platform flips u_active_secret to the pending value.
Verify with a synthetic call within 30 seconds of the swap.
If verification fails, the partner reverts and you investigate.

This is not zero-downtime but is the best you can do when the partner constraint is real. The discipline is the same; the overlap window just shrinks to a coordinated handoff.

Service account hygiene

Credential rotation only works if you know what credentials exist. The companion discipline:

Every service account is registered with an owner, a purpose, a partner endpoint, and an expected rotation cadence.
Service accounts not used in 90 days are flagged for deprovisioning.
Service accounts with elevated permissions (admin-equivalent) are reviewed quarterly.
Service accounts authenticated with passwords (rather than mTLS or OAuth) are tagged as legacy and prioritized for replacement.

A vault full of unrotatable, unowned, legacy credentials is the security debt you do not want. Rotation programs surface it; service-account hygiene prevents it from recurring.

For the related machine-identity discussion, see our ServiceNow Vault machine identity piece.

UI for the rotation operator

The rotation operator needs a single view that shows, for every credential under management: current state, age of active secret, scheduled next rotation, recent fallback events, and the owner. Pin it. Color-code: green for stable, amber for rolling_forward, red for rolling_back. Sort by age descending so the most overdue rotations appear first.

A red row sitting at the top for more than 24 hours is a rollback that was started and not resolved. Treat it as an alert.

OAuth-specific patterns

OAuth client secrets rotate using the same dual-credential model, with a small twist: the platform’s OAuth provider record can hold two client secrets for a brief overlap window if you model it explicitly. Many platforms do not expose this through the standard OAuth profile UI; you may need a custom provider configuration.

For OAuth access tokens (as opposed to client secrets), rotation is handled by the OAuth flow itself — short token lifetime, refresh on expiry. The discipline here is to ensure the refresh path is robust:

Token refresh failures should fall back to a fresh authorization, not crash the integration.
Token storage should be encrypted at rest, and the encryption key should rotate on its own cadence.
Refresh tokens should be treated with the same care as long-lived secrets; they often have effective lifetimes measured in months.

A common failure: an OAuth integration that worked for a year suddenly stops because the refresh token expired and nobody noticed. The defense is monitoring; the prevention is refresh-token expiry alerts at T-7 days.

Partner notification protocol

When you rotate, the partner needs to know. Some partners do this through API tokens managed in their portal — they do not need explicit notification. Others need a heads-up so their security team can verify the change is legitimate.

Maintain a per-partner runbook with rotation contacts, notification SLA, and the partner’s preferred verification method. Without it, every rotation involves an archaeology project to find the right contact.

Tradeoffs to be honest about

The dual-credential pattern requires partner-side support and adds complexity to your vault model. Both are real costs. For low-volume integrations where a 5-minute maintenance window is acceptable, the simpler pattern (vault swap + cutover) is fine. For high-volume integrations or anything customer-facing where downtime is visible, the dual-credential investment pays back the first time you avoid a Sunday-night rotation incident.

The other honest tradeoff: the fallback path masks slow partner-side propagation. If you see steady fallback usage long after the rotation window closed, you have a partner-side caching issue that the fallback is hiding. Investigate it; do not let the fallback become a permanent compensating control.

Bottom line

The dual-credential overlap pattern eliminates the rotation maintenance window when the partner supports it.
Vault model needs explicit active and pending fields plus a state flag. Anything simpler forces a cutover.
Fall back to the old credential on auth failure during rollover; record fallback usage to detect stale caches.
Audit trail must capture partner-side confirmation, not just internal events. Auditors want evidence.
Rotation works only if you know what credentials exist. Pair the rotation program with service-account hygiene.

[object Object]

The naive approach and what it costs

The dual-credential overlap

The vault model for overlap

The auth flow with fallback

The rotation calendar

The rollback path

Audit trail discipline

What to do when the partner cannot dual-credential

Service account hygiene

UI for the rotation operator

OAuth-specific patterns

Partner notification protocol

Tradeoffs to be honest about

Bottom line

Get one CRM read per week.

Next articles to explore →

ServiceNow Vault and Machine Identity Console

Cross-Scope Script Include Debugging: The Permission Maze

Impersonation Audit Trails: The Discipline That Survives an Audit

Encrypted Fields and Search: The Tradeoffs No One Explains

Now Assist Prompt Injection Defense: A Practical Threat Model

SecOps Response Runbooks: The Automation Pattern That Survives Audit