[object Object]

Predictive Intelligence is the easiest ML on the platform to ship and the easiest to forget about. Two years post-launch, the classification model is using a vocabulary the business no longer uses — and confidence scores look fine because there is no monitoring on actual outcome agreement.

Treat the model like an application

A PI model is software. It has a training set, a validation set, a deployment, and an end-of-life. None of these happen by accident.

The minimum lifecycle

  1. Train — define training set with explicit Created on filter, never “all records”
  2. Validate — minimum 1,000 records per target class, F1 above 0.75 to ship
  3. Deploy — shadow mode for two weeks before write
  4. Monitor — track agreement between predicted and actual every 30 days
  5. Retrain — when agreement drops 5 points or every 6 months, whichever comes first
  6. Retire — when business taxonomy changes, delete and start over

Shadow mode is non-negotiable

The platform supports shadow predictions — the model runs, the prediction is recorded, but no field is updated. Run for at least 14 days and pull the agreement rate against what humans actually chose. If it disagrees with humans on more than 25% of records, the model is not ready.

The drift query you owe yourself

var ga = new GlideAggregate('ml_capability_definition_log');
ga.addQuery('capability_definition', 'incident_categorization');
ga.addQuery('sys_created_on', '>', gs.daysAgo(30));
ga.addAggregate('AVG', 'confidence_score');
ga.query();

Average confidence sliding down by 8 points across a quarter is a strong signal of vocabulary drift.

Common training set mistakes

  • Including closed-cancelled records (poisoned labels)
  • No date floor (training on 2019 data in 2026)
  • Class imbalance worse than 10:1 with no rebalancing
  • Letting the model see the field it is supposed to predict, indirectly

When to give up

If the F1 score will not climb above 0.6 after three retraining attempts, the predicted field is not predictable from the available features. Stop and redesign the form, do not torture the model.

What to do this week

Pick your top PI capability. Pull the last 90 days of predictions and compare to the field’s actual closing value. If agreement is below 80%, schedule a retrain with a refreshed training window.

[object Object]
Share