Predictive Intelligence is the easiest ML on the platform to ship and the easiest to forget about. Two years post-launch, the classification model is using a vocabulary the business no longer uses — and confidence scores look fine because there is no monitoring on actual outcome agreement.
Treat the model like an application
A PI model is software. It has a training set, a validation set, a deployment, and an end-of-life. None of these happen by accident.
The minimum lifecycle
- Train — define training set with explicit
Created onfilter, never “all records” - Validate — minimum 1,000 records per target class, F1 above 0.75 to ship
- Deploy — shadow mode for two weeks before write
- Monitor — track agreement between predicted and actual every 30 days
- Retrain — when agreement drops 5 points or every 6 months, whichever comes first
- Retire — when business taxonomy changes, delete and start over
Shadow mode is non-negotiable
The platform supports shadow predictions — the model runs, the prediction is recorded, but no field is updated. Run for at least 14 days and pull the agreement rate against what humans actually chose. If it disagrees with humans on more than 25% of records, the model is not ready.
The drift query you owe yourself
var ga = new GlideAggregate('ml_capability_definition_log');
ga.addQuery('capability_definition', 'incident_categorization');
ga.addQuery('sys_created_on', '>', gs.daysAgo(30));
ga.addAggregate('AVG', 'confidence_score');
ga.query();
Average confidence sliding down by 8 points across a quarter is a strong signal of vocabulary drift.
Common training set mistakes
- Including closed-cancelled records (poisoned labels)
- No date floor (training on 2019 data in 2026)
- Class imbalance worse than 10:1 with no rebalancing
- Letting the model see the field it is supposed to predict, indirectly
When to give up
If the F1 score will not climb above 0.6 after three retraining attempts, the predicted field is not predictable from the available features. Stop and redesign the form, do not torture the model.
What to do this week
Pick your top PI capability. Pull the last 90 days of predictions and compare to the field’s actual closing value. If agreement is below 80%, schedule a retrain with a refreshed training window.