[object Object]

Twenty-three minutes into a P1 the on-call major incident manager is staring at a blank email composer in Freshservice, trying to remember whether legal wants “service degradation” or “service interruption”. This is not the moment for prose. Comms during a major incident are a templated state machine, and Freshservice gives you enough hooks to make it run itself if you set it up before the next outage rather than during.

The five comms you actually need

Most teams over-engineer this. The full set is:

  1. Initial acknowledgement (within 15 minutes of declaration).
  2. Investigating update (every 30 minutes during active impact).
  3. Mitigation in progress (when a workaround is in flight).
  4. Service restored (impact ended, root cause pending).
  5. Post-incident summary (within 48 hours, links to PIR).

That is five templates per audience tier, not 47 bespoke emails per incident. If your runbook lists more, you have ceremony, not comms.

Audience tiers and what they need

Three audiences cover almost everything:

  • Affected end users: short, factual, blameless. No internal jargon. No vendor names. Tell them what is broken, what they can still do, when the next update lands.
  • Internal stakeholders: more detail, including system names and a confidence level on ETA. They will forward your message — write it knowing that.
  • Executives: business impact in money or customer count, current status, decision points needed from them. Three bullets, no more.

Build a 5 x 3 matrix of templates. That is your entire major incident comms library.

Wiring it into Freshservice

Freshservice exposes major incident notifications through the Status Page module and through the Workflow Automator acting on the parent incident ticket. Use both. Status Page handles the public-ish broadcast. Workflow Automator handles the internal email blasts triggered by status transitions on the ticket.

The trick is to store templates as Solution Articles in a private “Comms Templates” category and have the workflow inject them by reference rather than hardcoding the email body inside the workflow node. When legal asks for a wording change, you edit the solution article, not 15 workflow nodes.

{
  "workflow": "MI_Comms_Cadence",
  "trigger": "Ticket Updated",
  "conditions": [
    { "field": "priority", "operator": "is", "value": "Urgent" },
    { "field": "tag", "operator": "contains", "value": "major_incident" }
  ],
  "actions": [
    {
      "type": "send_email",
      "audience": "[email protected]",
      "template_ref": "kb:5012-mi-acknowledged",
      "merge_fields": ["ticket.subject", "ticket.created_at", "ticket.id"]
    },
    {
      "type": "wait",
      "duration_minutes": 30
    },
    {
      "type": "send_email_if",
      "condition": { "field": "status", "operator": "is", "value": "Open" },
      "template_ref": "kb:5013-mi-investigating",
      "merge_fields": ["ticket.id", "ticket.last_update_summary"]
    }
  ]
}

The structure repeats: wait, check status, fire next template. Status transitions to “Pending” or “Resolved” short-circuit the cadence into the restoration template.

The 30-minute timer is non-negotiable

The single most damaging thing a major incident comms loop can do is go quiet. Customers and execs interpret silence as “nothing is happening” or, worse, “they don’t know what is happening”. Even if the engineering update is “still investigating, no new findings”, you send it.

Encode this into the workflow with a recurring wait node. Do not rely on a human remembering. Humans in P1 incidents are not remembering anything other than the firehose.

Merge fields that hurt you

Freshservice merge tags will happily inject raw values. That sounds convenient. It is a trap. The ticket.subject field is whatever the originating alert system put there — often something like prod-eu-west-1: pg_replication_lag > 30s on db-shard-7. You do not want that in an email to a CFO.

Use a small set of curated merge fields written by humans:

  • incident_headline — short, plain-English title.
  • customer_impact — one sentence on visible symptoms.
  • eta_confidencelow, medium, high, with a default of low.
  • next_update_at — explicit ISO timestamp.

Map these onto custom fields on the incident ticket. The major incident manager owns them. Templates render them. The raw alert subject never reaches a customer.

Public status page nuance

If you also run a public status page through Freshservice, remember the two streams are not the same audience. Public posts:

  • Strip every internal system name.
  • Use the company’s public service taxonomy (Login, Checkout, API) not your internal one (auth-gateway, pay-svc-v3).
  • Lag the internal stream by 60–90 seconds when severity is uncertain. Update slightly behind ground truth, never ahead.

Build a second template family pub_* for the status page and route them through a different workflow that only fires when the major incident manager flips a public_comms toggle on the ticket. Default off. You promote to public when you have signal, not before.

Internal escalation and the “boss man” loop

Executives have a different cadence. They get an update on declaration, on every severity change, and on resolution. They do not get the 30-minute drip — they will phone you. Wire a separate workflow on the same incident ticket that watches priority and status transitions and sends to an exec_distro mailing list with a tighter template.

For escalations beyond the immediate exec list, see freshdesk-sla-escalation-rules for the same pattern applied in Freshdesk — the hierarchy mechanics translate cleanly.

Post-incident: closing the loop

The fifth comm is the post-incident summary. Send within 48 hours, link to the formal PIR/RCA when ready, and include:

  • Final impact (duration, users affected, transactions lost).
  • Root cause summary, one paragraph, blameless.
  • Three concrete actions with owners and dates.
  • An invitation to direct follow-up questions to a named human.

Templated, but with three human-written paragraphs. Do not automate the prose here. The willingness to write specifics is the thing that maintains trust after the outage.

What to drill before the next P1

Run the comms workflow in your sandbox quarterly. Trigger a synthetic major incident ticket and walk through every template firing on the live cadence. The first time you discover a merge field is broken should not be at 02:00 with customers reading the result. See freshworks-sandbox-environment-strategy for the safe way to do this without polluting production data.

Bottom line

Major incident comms is templating plus discipline. Five templates, three audiences, one cadence timer, curated merge fields, and the major incident manager owning the human-readable variables. Set it up cold and your worst hours stop producing your worst emails.

[object Object]
Share