There is a recurring conversation we have with manufacturing operations directors. They have piloted predictive maintenance. The model works. The alerts fire. And yet the factory floor has not improved. Downtime is roughly the same. Maintenance cost is roughly the same. The pilot is judged a partial success, technically, and is quietly deprioritized. This is so common it should be the default expected outcome, and the reason has nothing to do with the model.
The model is the easy part
Predictive maintenance algorithms are mature. Vendors deliver sensor packages with pre-trained models for common equipment classes. The accuracy is high enough that false positives are usually a configuration problem, not a model problem. McKinsey research consistently shows that predictive maintenance can reduce downtime by 30 to 50 percent and increase machine life by 20 to 40 percent [1], and the cited results are achievable. The technology works. What does not work, in most pilots, is what happens after the alert fires.
An alert that says 73 percent confidence of failure in 168 hours is technically excellent and operationally useless. The technician needs five other things to act on it.
What the alert needs to be
An alert that says 'bearing X on pump Y is showing vibration anomaly Z, with 73 percent confidence of failure in the next 168 hours' is technically excellent and operationally useless. The maintenance technician receiving the alert needs to know: which pump exactly, where physically, what parts to bring, which procedure to follow, who has authority to schedule the work, what the production impact is of taking the pump offline now versus later, and what evidence to log during the repair so the model gets better next time. None of that is in the alert. All of it has to be in the workflow around the alert. When the workflow does not exist, the technician opens the alert, mentally translates it into action, and the value of the prediction collapses into the existing reactive-maintenance routine.
The five workflow elements
A working predictive maintenance deployment has five workflow elements that the model itself does not provide. First, asset registration: every alert routes to a specific physical asset with location, parts list, and procedure reference. Second, severity calibration: alerts are tiered, not all of them are urgent, and routing logic respects the tier. Third, scheduling integration: alerts feed into the planned-maintenance window selection, not into a separate queue. Fourth, mobile context: the technician arriving at the asset sees the alert history, the model's reasoning, and the relevant manual sections on their device. Fifth, feedback capture: what was found, what was done, what was the actual time to failure. The fifth element is the one that keeps the model from degrading, and it is the one most often skipped.
Why digital work management comes first
McKinsey's analysis of successful maintenance digitization consistently identifies digital work management as a prerequisite for predictive maintenance to work at scale [2]. The logic is straightforward. The data needed to retrain predictive models comes from the ground-truth labels generated during repairs. If repairs are documented on paper or in free-text fields, the labels are not usable. If they are captured as structured data in a digital work management system, the labels feed back into the model and accuracy compounds. Most stalled predictive maintenance pilots are stalled at this loop. The model fires, the technician fixes, the label never reaches the data layer, and after six months the model is no better than it was at deployment. Building the digital work management layer first, even before deploying the predictive model, sequences the investment correctly.
The cost of unplanned downtime, properly accounted
Industry analyses, including those cited by McKinsey, place the cost of unplanned downtime at an average of USD 260,000 per hour across industrial operations [3]. For Belgian heavy industry, the equivalent figure depends on the sub-sector but tracks with this range. The number that tends to be missing from these accountings is the cascading cost: restart cycles, quality losses during stabilization, downstream commitment penalties, and the management attention consumed by the recovery. Honest accounting of the full cost typically adds 25 to 40 percent to the headline figure. This matters for ROI calculations, because predictive maintenance investments often look marginal when judged against headline downtime cost alone and clearly positive when judged against the full cascade.
The 260k per hour number is the floor, not the ceiling. The cascade adds another 25 to 40 percent that almost never gets accounted for.
When predictive maintenance is the wrong investment
Three cases where predictive maintenance is not the right place to start. First, when the binding constraint is changeover time rather than unplanned downtime. The OEE math will tell you this; if availability losses are dwarfed by performance losses, fix performance first. Second, when the equipment is old enough that sensor instrumentation costs approach the cost of full replacement. The marginal economics fail. Third, when the maintenance team is already at sustained high utilization on planned work, in which case predictive alerts will be queued rather than acted on, and the value of prediction is destroyed by execution capacity. The right diagnostic is to count maintenance-team backlog before the model goes live. If the backlog exceeds two weeks of work, fix the capacity problem first.
What the workflow looks like when it is working
In a deployment that is working, the operations director can answer a specific question about any asset on the floor: when was it last serviced, what was found, what is the current condition score, what is the projected window for next intervention. The answer takes under thirty seconds. The technician working on the asset arrives knowing what to bring and what to expect. The maintenance manager looking at next week's schedule sees prediction-driven items mixed into planned work with consistent prioritization logic. None of this is glamorous. All of it is what makes the model's predictions translate into actual downtime reduction. The pattern is consistent across the successful deployments we have seen and consistently absent from the stalled ones.
