A Flow connects CRM to your billing provider. It works for two months. The provider has a 30-minute outage at 11 AM Tuesday. Flow tries to push 47 invoices during that window. All 47 fail. Flow marks them errored and stops. Nobody re-runs them. Three weeks later finance asks why revenue is short by $90k. Welcome to default error handling.
Flow’s built-in retry is minimal. Production-grade patterns layer on top. Three of them, in order: retry with backoff, idempotency, and a circuit breaker.
What Flow gives you out of the box
- One automatic retry on transient errors for some triggers
- Manual re-run from the Flow history UI
- Error emails to the Flow owner
- A history list you can query
That’s it. No exponential backoff. No idempotency tracking. No automatic detection that the downstream system is sick and Flow should stop hammering it.
Pattern 1: retry with exponential backoff
Wrap the external call in a Deluge function inside the Flow, not as a native action. The function manages retry state.
// flow_call_with_retry: callable from any Zoho Flow as a custom function
// Args: url, method, body, max_attempts (default 5), base_delay_ms (default 1000)
Map flow_call_with_retry(map args)
{
url = args.get("url");
method = ifnull(args.get("method"), "POST");
body = args.get("body");
max_attempts = ifnull(args.get("max_attempts"), 5).toLong();
base_delay = ifnull(args.get("base_delay_ms"), 1000).toLong();
attempt = 0;
last_error = "";
while(attempt < max_attempts)
{
response = invokeurl
[
url: url
type: method
parameters: body == null ? "" : body.toString()
headers: {"Content-Type": "application/json"}
];
status = response.get("status_code");
// Success
if(status >= 200 && status < 300)
{
return {"ok": true, "status": status, "body": response, "attempts": attempt + 1};
}
// Don't retry on client errors except 408, 429
if(status >= 400 && status < 500 && status != 408 && status != 429)
{
return {"ok": false, "status": status, "body": response, "attempts": attempt + 1, "retried": false};
}
// Retryable: 408, 429, 5xx
attempt = attempt + 1;
last_error = response.toString();
if(attempt < max_attempts)
{
// Exponential backoff with jitter
jitter = math.random() * 500;
delay = (base_delay * math.pow(2, attempt - 1)) + jitter;
thread.sleep(delay.toLong());
}
}
return {"ok": false, "error": last_error, "attempts": attempt, "retried": true};
}
Three rules embedded here:
- Only retry on retryable status codes (408, 429, 5xx). A 400 won’t get better.
- Exponential backoff with random jitter prevents synchronized retries from a queue of jobs.
- Capped attempts. Five is plenty; ten is hiding a real problem.
Pattern 2: idempotency
Retries are dangerous without idempotency. If the first call succeeded but the response was lost, the retry double-charges. Generate a deterministic key per logical operation and pass it.
// In your flow's Deluge step, before calling flow_call_with_retry
deal_id = input.deal_id;
amount = input.amount;
operation_date = zoho.currentdate.toString("yyyy-MM-dd");
// Deterministic key per logical operation
idempotency_key = "invoice_" + deal_id + "_" + operation_date;
payload = Map();
payload.put("amount", amount);
payload.put("customer_id", input.customer_id);
payload.put("idempotency_key", idempotency_key);
result = flow_call_with_retry({
"url": "https://api.billing-provider.com/v1/invoices",
"method": "POST",
"body": payload
});
The downstream provider (Stripe, ERP, whatever) sees the idempotency key. If the same key arrives twice, they return the same response without double-creating. Always include it on writes.
Pattern 3: circuit breaker
If the downstream system is dead, stop hammering it. Open the circuit. Try again later.
// circuit_check: call before any external call
// Returns false if circuit is open (skip the call)
boolean circuit_check(string circuit_name)
{
state = zoho.crm.searchRecords(
"Circuit_State",
"(Circuit_Name:equals:" + circuit_name + ")"
);
if(state.size() == 0) { return true; } // no state, assume closed (ok)
c = state.get(0);
current_state = c.get("State");
if(current_state == "closed") { return true; }
if(current_state == "open")
{
open_until = toDateTime(c.get("Open_Until"));
if(zoho.currenttime > open_until)
{
// Move to half-open: allow one probe
zoho.crm.updateRecord("Circuit_State", c.get("id"), {"State": "half_open"});
return true;
}
return false; // still open
}
if(current_state == "half_open") { return true; }
return true;
}
// circuit_record: call after the external call to update state
void circuit_record(string circuit_name, boolean success)
{
state = zoho.crm.searchRecords(
"Circuit_State",
"(Circuit_Name:equals:" + circuit_name + ")"
);
if(state.size() == 0)
{
zoho.crm.createRecord("Circuit_State", {
"Circuit_Name": circuit_name,
"State": "closed",
"Consecutive_Failures": 0,
"Updated_At": zoho.currenttime
});
return;
}
c = state.get(0);
failures = ifnull(c.get("Consecutive_Failures"), 0).toLong();
threshold = 5;
open_duration_min = 10;
if(success)
{
zoho.crm.updateRecord("Circuit_State", c.get("id"), {
"State": "closed",
"Consecutive_Failures": 0,
"Updated_At": zoho.currenttime
});
}
else
{
failures = failures + 1;
new_state = "closed";
open_until = null;
if(failures >= threshold)
{
new_state = "open";
open_until = addMinute(zoho.currenttime, open_duration_min);
}
zoho.crm.updateRecord("Circuit_State", c.get("id"), {
"State": new_state,
"Consecutive_Failures": failures,
"Open_Until": open_until,
"Updated_At": zoho.currenttime
});
}
}
Use them together:
// In your Flow step
if(!circuit_check("billing_provider"))
{
// Defer to a retry queue, don't call now
zoho.crm.createRecord("Flow_Retry_Queue", {
"Operation": "invoice_create",
"Payload": payload.toString(),
"Retry_After": addMinute(zoho.currenttime, 10),
"Reason": "circuit_open"
});
return;
}
result = flow_call_with_retry({...});
circuit_record("billing_provider", result.get("ok"));
if(!result.get("ok"))
{
// Log to a quarantine table for manual review
zoho.crm.createRecord("Flow_Failures", {
"Operation": "invoice_create",
"Deal_Id": deal_id,
"Payload": payload.toString(),
"Error": result.get("error"),
"Failed_At": zoho.currenttime
});
}
The retry queue
The Flow_Retry_Queue is a custom module. A scheduled function drains it every 5 minutes. Checks the circuit. If closed, retries the operation. If still open, leaves the row for next tick.
This is the difference between losing 47 invoices to a 30-minute outage and queueing them up, riding out the outage, and processing them clean once the provider is back.
Alerting that doesn’t spam
Don’t alert on every retry. Alert on:
- Circuit opens (downstream system declared unhealthy)
- Circuit stays open more than 30 minutes (real incident)
- Quarantine table grows beyond a threshold (something the auto-retry can’t fix)
- Per-day failure count above baseline
Tune these. A Cliq channel ping for every retry will be ignored within a week.
What goes in the quarantine table
The Flow_Failures table is where ops looks for manual intervention. Each row has:
- Operation type
- Source record IDs
- Payload (so the call can be re-issued exactly)
- Error message
- Failed at, retry count
- Status (open, resolved, ignored)
Ops reviews daily. Resolves or escalates. Don’t let this table grow past 100 rows — it means automation isn’t catching what it should.
Visibility for the Flow owner
The Flow itself should report two metrics:
- Success rate over rolling 24 hours
- Average duration end-to-end
If success rate drops below 95% or duration spikes above baseline, alert. Otherwise, silence.
For broader workflow patterns this competes with, see Zoho Flow vs Workflow rules. For the rate-limit interplay that pairs with retries, see Zoho Deluge rate limit survival guide.
Bottom line
Default Flow error handling is a try-once-and-give-up pattern. Production needs retry with exponential backoff and jitter, idempotency keys for safe replay, and a circuit breaker that opens when downstream is sick. Layer them: retry first, with idempotency on writes, behind a breaker. Drain a retry queue every 5 minutes. Quarantine what can’t be auto-recovered. Alert on circuit opens, not on retries. The 30-minute provider outage becomes a non-event instead of a finance fire drill.