5 Ways to Improve Availability and Reliability With Proactive Maintenance

Blog Author Favicon
SafetyChain Software
Contributing Writer

How do we balance time and money when it comes to plant maintenance? It’s common for many organizations to rely upon reactive maintenance, and fixing problems as they occur. While this method can seem cost-effective initially, the costs will increase over time with extended downtime and greater unpredictability. Proactive maintenance, however, while generating a higher upfront cost, can lower overall maintenance costs, reduce equipment and employee downtime, and significantly increase asset availability.

However, many organizations do not fully address maintenance functions holistically in the organization. The challenge is to think about the maintenance function’s partnership with other groups or departments. Consider equipment access, scheduling participation, assignment of blame, and when the preventive maintenance tasks are delayed. On average, maintenance only owns about 17% of the asset reliability. This percentage can vary depending on the manufacturing plant, but it is important for each organization to consider what makes the most sense with their team and resources.

Consider this: Engineers own the largest share of asset reliability. If the asset isn’t installed or designed correctly, maintenance must deal with the fallout. After maintenance comes operations. But if operations is not operating the equipment correctly, then no amount of maintenance will keep it running correctly. Even sales and marketing can directly impact maintenance activities or the reliability of the assets, as management may prefer to prioritize sales over preventive maintenance that may result in downtime. Spare parts also factor into the whole picture. Some repairs may rely on spare parts obtained through an increasingly volatile and unpredictable supply chain. 

The Consequences of Run-to-Failure

Proactive maintenance is about finding a way to balance risk. Let’s look at an example. 

Example: What Is the Risk You Are Willing to Accept? 

In a piece of equipment, a vibration analysis detects a misalignment. The proactive cost to perform the alignment is $625. But management decides to continue to run the equipment. Over time, the equipment still performs, but the misalignment damages the bearings. At this stage, the repair costs to replace the bearings and complete the alignment are now $2,355. However, the equipment still functions, so management is still unwilling to shut down production for the time it would take to perform the repairs. Finally, additional secondary failures occur and result in catastrophic equipment failure. This run-to-failure results in a forced shutdown of production, unscheduled downtime, overtime, and equipment repair or replacement. The cost is now $23,495 and has impacted the facility’s ability to fill a customer order. 

From a business perspective, what risk are you willing to accept? From a maintenance perspective, the above example is horrifying at first glance. But the management must also weigh the full risks. While a run-to-failure may result in more than $23,000 in costs, doing so may generate hundreds of thousands or millions of dollars in profit based on market conditions. If it was your business, what would you choose given the profit potential?

5 Tips to Improve Proactive Maintenance

1.) Identifying the Can vs. Want

Let’s imagine we have a pump. The process requires 100 gallons per minute from the pump. The pump can transfer 120 gallons per minute. The maintenance function lives between the “can” and the “want.” Maintenance must own the capacity of the equipment. Operations owns the equipment itself, and as part of that, those decisions will come to accept the risk as part of the equation.

One of the first challenges is to paint a picture, so people have the knowledge and information to make the right choices with respect to risk. While education cannot always prevent getting into a broke-fix scenario, it can reduce the likelihood of this outcome, and allow management to make better business choices. One option is to create an operations and maintenance team of managers and supervisors to agree in partnership on how to operate the facility, how people will get access to the equipment, and other aspects. Operations members can educate maintenance on what is needed for success and vice versa. 

The goal is to help people progress along the learning curve with training. At the beginning of training, there is ignorance. From ignorance, people can move to awareness and eventually literacy. However, at best, the training piece is only half of the equation. The other half comes with coaching. The coaching process can identify opportunities and ways to improve and coach you throughout the process. Finally, certification can pinpoint the competency of an individual on the subject. If the employee doesn’t meet the matrices, a competency development process can help identify where additional coaching may need to occur. Certification validates the knowledge and helps build competency.

2.) Precision—Right Maintenance, Right Way, at the Right Time

The idea of understanding equipment failure has roots in the aviation industry. In the late 1950s and early 1960s, there were about 60 crashes per one million takeoffs—the equivalent of two 747s a week crashing in today’s numbers. In 1978, Nowlan and Heap came out with their report on Reliability Centered Maintenance. 

However, the assessment indicated that many of the things believed to be age-related repairs were actually occurring more randomly. It’s not about just doing more maintenance; it’s about doing the right maintenance, at the right time, in the right way. More does not equal better.

In Don’t Just Fix It, Improve It: A Journey to Precision Domain, Winston Ledet talks about three domains we can apply to maintenance:

  • The reactive domain: Many of us have come from the reactive domain, and we’re trying to get to the planning and scheduling domain.

  • The planning and scheduling domain: While many people target this domain, it is very unstable. The goal is not about planning and scheduling more work, but to eliminate the need to do the work in the first place. 

  • The precision domain: Purposeful action integrates new ways of working into the everyday functions of the workers.

What Is Reliability-Centered Maintenance?

First developed in the aviation industry, reliability-centered maintenance (RCM) seeks to identify potential problems with an organization’s assets and establish a process for ensuring the assets can continue to operate at maximum capacity. Unlike preventive maintenance, reliability-centered maintenance is selective. Performing intrusive maintenance can disturb an otherwise stable system. In the airline industry, analysts found that 11 percent of parts failed in age or wear-related patterns. That means that 89 percent of parts failed randomly. There is a danger that if we overhaul something “just in case” it might fail after this point, the overhaul itself could cause the item to fail. These failures come from:

  • Poor design

  • Poor cleanliness 

  • Improper operation

  • Incorrect parts or materials: 

  • Improper training or lack of training

  • Inappropriate maintenance strategy

  • Failure to adhere to best practices

  • Human error

According to Ledet, poor work habits cause 84 percent of failures. Fixed intervals or preventive maintenance (PM) are only valid for equipment that suffers from specific age-related modes. On-condition tasks have the goal of finding things in the act of failure and then, taking corrective action when needed from a planned and scheduled perspective.

Strategy in Motion: A Guide to Project Implementation and Operational Tracking

Maintenance Strategies

  • Reliability-Centered Maintenance (RCM): The most structured choice is reliability-centered maintenance. Many organizations shy away because they view this approach as a resource-consuming monster. However, if people are adequately trained and understand how to facilitate groups, reliability-centered maintenance is an effective strategy. Even if you are not going to use it, taking an introductory course in RCM provides significant insight into the following options to develop or improve your maintenance strategies. Even PMO requires understanding the failure mode. From there, you can choose the correct strategy. 

  • Failure Modes Effects Analysis (FMEA): This approach, in many ways, is based on some severity occurrence detection and can be subjective. Many organizations try to streamline, but there’s a risk of leaving things out. 

  • Preventive Maintenance Optimization (PMO): This method may still need failure modes. 

  • Original Equipment Manufacturer (OEM) Manual: Many maintenance strategies taken from the original equipment manufacturer manual are not written to a specification. 

  • No scheduled maintenance: Choosing to do nothing is a legitimate choice. However, no scheduled maintenance is not the same as run-to-failure. An organization can still plan for the failure and stock the parts for repair, for example.

Reliability-centered maintenance analysis shows that 40 to 60 percent of preventive maintenance tasks add no value. Organizations are sending technicians out to PM assets but are getting nothing in return at the end of the day. Organizations often do not have a good process for updating the preventive maintenance tasks or performing root cause analysis to drive that. Consider condition monitoring, such as inspections, and predictive tools like vibration sensors or infrared ultrasound. Additionally, consider the preventive maintenance tasks themselves. Are they indicators of failure or indicators of potential failure? 

The P-F Curve

Organizations can use the P-F Curve with on-condition inspections and predictive tools to determine when to take corrective action. The “P” represents the potential for failure. That said, recognize that when the P is found, the item is already in the act of failing. The “F” is the point of functional failure when the item no longer meets the requirements for the process. Using the “want” from above, the “F” is a total failure of 0 GPM or partial failure of less than 100 GPM. The goal of the maintenance organization should be to find the “P”, not the “F.” 

3.) Estimating Accuracy—Improve with Continuous Feedback

One of the challenges in many organizations is that scheduling is not very accurate. What should happen is that the planner spends time in the field and walks down jobs to determine what the task steps are. Performing this job walk down and analytical estimating is key to figuring out a more accurate estimate of the total job duration. Many planners tend to estimate jobs in buckets of shifts, e.g., a whole shift or a half a shift to perform the maintenance, when in reality, the job may only take two and a half hours. It’s an average. One technician may perform the job frequently and complete the task in only two hours, while another may have less experience and require three hours. Accurate estimates also ensure that technicians have the correct amount of workload scheduled. The goal isn’t to make them work harder; it ensures their workday has fewer avoidable delays like looking for parts and waiting on equipment to be available.  Ideally, the planner would spend about one-third of the day in the field and have an average estimating accuracy of +/- 10-15 percent. 

4.) Planned and Kitted Work Available for Execution

By kitting the materials during the planning process and having them at the ready, maintenance can reduce the avoidable delay of searching for parts. The supervisor should have some knowledge of the scheduled work and be ready to schedule the backlog. The backlog is important because if something happens and some scheduled work can’t be done, work from the backlog can be pulled in. While equipment breakdown takes precedence, using “ready” work can effectively leverage downtime windows. Separately, remember that it should be planning, then scheduling, not planning and scheduling. 

5.) Critical Events

Finally, look at the big picture when failure occurs. Critical events are items that take the process down for 2 hours or more, for example. When these items occur, a level of root cause is initiated. It can be as simple as the 5 whys or a more formal root cause analysis. The appropriate root cause approach is used and resulting actions are tracked for implementation. The goal is not to assign blame, but to understand and learn the reasons why to prevent repeat occurrences. Are you determining the root causes or just finding the symptoms? One example would be when a motor fails. Maintenance can replace the motor, but it’s important to identify why it failed. Performing an “autopsy” can help determine what caused the failure of the part. Then, validate the process. 


Applying these five approaches will significantly improve asset reliability and drive a more proactive maintenance approach.  Not sure where to start? Reach out and we can help you with solutions that address the issues.