An effective reliability program provides the right maintenance on the right assets at the right time.
Rather than performing a critical evaluation of their reliability maintenance plans, many companies in the chemical process industries (CPI) replace malfunctioning or dated equipment with the latest technology. However, new technology only enables an ineffective maintenance program to be more efficient at being ineffective. The goal of a maintenance and reliability program is to deliver a proper balance of maintenance activities — primarily those aimed at identifying impending failures — to allow for timely corrective actions. The optimal reliability maintenance program for a plant provides the right maintenance on the right assets at the right time.
After years of operating, many CPI plants lose track of the main goals of their maintenance program. For example, after experiencing a rare equipment failure, a plant instituted frequent inspections aimed at preventing the recurrence of that failure — a significant burden on the maintenance staff. Thus, the maintenance program changed from one of coordinated and well-thought-out preventive behavior to one of knee-jerk reactionary behavior.
By understanding which assets are most important (right assets), what should be done to restore or maintain the inherent reliability of those assets (right maintenance), and at what frequency those actions should be taken (right time), a maintenance organization can become more effective and more efficient — and often reduce costs significantly.
This article looks at how plant maintenance organizations in the CPI can re-evaluate and restructure their reliability maintenance plans to incorporate industry best practices.
Establishing or re-establishing a maintenance and reliability program involves the following basic steps:
- Create a snapshot of the organization.
- Develop a plan and a strategy for implementing it.
- Prioritize assets within the plant.
- Define maintenance requirements.
- Allocate personnel resources.
- Deploy tools (technology and/or software).
- Measure success and continuously improve.
These actions serve as the building blocks to redefine the plant’s maintenance program and to help build a sustainable proactive organization to minimize costs and improve reliability and availability of plant assets.
Create a snapshot of the organization
To determine how to get to your destination, you need to know where you are beginning. A plant can gauge the current status of its maintenance program in many ways, such as by industry benchmarking, through surveys, or using a trial-and-error approach.
One way is to measure performance against a set of best practices, as illustrated by the example in Figure 1. This organization’s performance on 21 elements of a best-practice maintenance program was rated on a scale of 1 to 10, and these ratings were plotted on a spider chart. In this assessment, scores of 8–10 are considered world-class performance.
An analysis such as this is typically conducted by a third party, who facilitates a series of interviews with personnel at all levels of the corporation. The evaluator asks questions about each of the elements, documents the answers, compares answers across interviewees, and determines an average rating for each element. The results are compared to a set of standards that has been developed to describe a best-practice organization. This is an excellent way for management to get feedback on the real issues, both good and bad, affecting the organization’s maintenance program.
The facility described by Figure 1 has some room for improvement in its use of work management systems and information integration systems. It seems to have good maintenance and diagnostic technologies in place, but its integration of that information into the identification, planning, scheduling, and execution of work needs improvement. With regard to management and work culture, accountability and organizational skills are good, but there is room for improvement in goal setting, global metrics, and developing continuous improvement plans. Finally, more emphasis should be placed on personnel training, which may have a significant positive impact on some of the other areas. The gaps between the plant’s ratings and the values corresponding to world-class performance (gray areas) represent opportunities for improvement.
Develop a plan for improvement
By focusing on the lower ratings in its snapshot, the organization can develop a plan to close the gaps and raise its performance on key metrics, thereby improving its overall performance. This step may include planning for training, developing and optimizing procedures, improving communications, integrating software systems, realigning organizational structures, reallocating manpower, etc. The plan should rank the plant’s assets based on how critical they are to the business, and then optimize the associated maintenance tasks based on those rankings. The specifics of the plan will depend on the direction and goals of the company.
An action plan for the example in Figure 1 might include:
- Prioritize all plant assets by assigning each an objective and relative ranking based on what is business-critical to plant operations and asset availability.
- Optimize current preventive maintenance (PM) plans based on the relative criticality of assets to support plant availability needs.
- Optimize the workflow and define responsibilities for identifying and planning work.
- Provide training to enable employees to better utilize the computerized maintenance management system (CMMS) and integrate diagnostic-technology data into the CMMS.
- Establish a feedback loop for all work orders to evaluate lessons learned and to help refine standard job plans for continuous improvement.
- Develop goals that are aligned with maintenance objectives and relevant to the workforce. Make those goals visible to employees.
- Provide refresher training to ensure that all personnel have up-to-date skills.
Prioritize plant assets
The next step is to execute the plan, which typically starts with taking an objective look at the relative criticality of assets in a particular area or system. Operations and maintenance personnel should collaborate to define the basic functional systems and the associated assets (or group of assets) that work together to achieve a specific function. Fortunately, much of this work is often completed as part of a CMMS implementation.
One way to prioritize plant assets for maintenance is to assign a reliabiltiy optimization ranking (ROR) to each asset, as shown in Figure 2. This method starts by determining the system-criticality ranking (SCR) of each subsystem based on the impact its failure would have on safety, the environment, annual repair costs, product throughput, and product quality. The SCR is an average of these five factors, and is a number from 1 to 10, with 10 being the most critical subsystem (failure would have a big impact) and 1 being the least critical (failure would have little to no impact) to plant output.
Then, determine the production-criticality ranking (PCR) of each asset within the subsystem. The PCR is the impact the asset has on the function of its parent system. This will help you define where you have single-point failures in your plant (versus redundancy). The PCR is a ranking between 1 and 10, with 10 being an asset with high criticality to the functioning of its parent system.
Next, calculate the business-criticality ranking (BCR) of each asset by multiplying the subsystem’s SCR by the asset’s PCR. After a functional failure occurs, the BCR will enable you to prioritize repair work. The assets with the highest BCR values should be repaired first.
The assets that rank highest on the BCR list might not be the assets that require the most maintenance. Not all assets have the same likelihood of failure. Before making an investment to prevent a failure, you need to understand where the highest probability of failure lies. One way to measure this is to evaluate where you are currently spending your resources in an effort to prevent failures through inspection, preemptive change-out, or other maintenance activities. Also consider the frequency of failures of equipment that is not subject to scheduled preventive action. By considering these factors, you can determine the asset failure likelihood factor (AFLF). For each asset, assign an AFLF ranking from 1 to 10.
Multiplying the BCR by the AFLF for each asset yields the reliability optimization ranking (ROR). You can then use the ROR to prioritize assets and identify the best opportunities for optimizing preventive maintenance strategies.
Define maintenance requirements
To ensure that you have a viable maintenance strategy, first examine the scheduled maintenance activities that are currently being performed. Make sure there is an appropriate balance of conditional inspections, pre-emptive replacements, monitoring technology applications, and/or run-to-failure strategies based on observed asset behavior.
To determine the need to perform a maintenance activity, you need to take an objective look at why you would perform a task. Objective evidence of need is crucial to defining a correct strategy.
United Airlines was one of the first companies to establish a method to define maintenance needs in order to create a viable maintenance plan. In the late 1960s, with the introduction of the Boeing 747, the airline industry needed to re-evaluate its maintenance philosophy. At that time, most maintenance involved replacing components based on the number of hours flown. However, as aircraft became increasingly complex, this approach was not economically feasible. United Airlines formed a committee to determine the reasons to do — or not to do —maintenance. The committee developed a single statement that explained why maintenance is performed: “We do maintenance because hardware reliability degrades with age, but we can do something to restore or maintain the original reliability that pays for itself.” Within this statement are three hypotheses:
- Hypothesis 1: Hardware reliability is known to degrade with age.
- Hypothesis 2: Maintenance tasks can restore or maintain original reliability.
- Hypothesis 3: Maintenance pays for itself (i.e., the value of the maintenance exceeds its cost).
These three hypotheses explain why maintenance is done, and they eventually led to the development of Reliability-Centered Maintenance (RCM). The underlying principle of RCM is that a certain maintenance task should be performed only if the hardware is degrading (i.e., a failure is probable), the task is applicable to the potential failure mode, and it is a reasonable task. Before starting maintenance, ask the following three questions (Figure 3):
- Is a failure probable? According to RCM principles, all maintenance actions should address specific modes of asset failure. Consult with operations and maintenance personnel to determine the actual failure modes of your plant assets, rather than their theoretical behavior. This will help you eliminate needless repair tasks that are aimed at failures that are not often seen. You must validate that the maintenance task in question addresses a failure mode that is probable.
- Is the task applicable? If the maintenance task addresses a probable failure mode, you can then determine whether that maintenance task will help remedy or mitigate that failure mode. Before deeming a maintenance task applicable, ask questions such as: Will this task mitigate the risk to an acceptable level without being intrusive? Will it restore or maintain the inherent reliability of the asset?
- Is the task reasonable? Finally, if the maintenance task is beneficial in restoring the equipment’s original reliability following a failure that is probable, you need to evaluate whether that task is worth doing based on the consequences of the failure if it were to occur. Where safety is an issue — the failure may cause personnel injury — does performing the action reduce the probability to an acceptable level? If operational impact rather than safety is the main concern (e.g., reduced product output), does the action reduce the risk to an acceptable level? If neither safety nor operations are at risk, the task is reasonable only if performing the action costs less than the consequences of the failure.
If the answer to these three questions is yes, then the maintenance action is warranted. In many cases, one of these will not be satisfied — indicating that the maintenance task is not the best way to prevent an asset failure. In these cases, consider a different maintenance plan.
Allocate personnel and deploy tools
Next, determine the annual workload required to carry out the maintenance plan. The frequency and complexity of each task will determine how many man-hours are required for maintenance each month.
If technology was implemented to optimize an existing strategy, the total workload might be reduced. In some cases, the technology may need to be re-allocated from low-criticality assets to higher-criticality assets.
Measure success and continuously improve
Your maintenance plan identifies the right assets, the right activities, and the right timing. Now you can implement the plan, evaluate its effectiveness, and work toward continuously improving the reliability maintenance plan and plant reliability.
For example, after evaluating and redefining its maintenance plan, a plant determines that it is operating at an average level — production is average, costs are average, and failures are average. In this case, it would seem that the plant has achieved its goal of performing the right maintenance on the right assets at the right time. However, the plant is likely performing maintenance actions too often. Without further improvement and evaluation, it may never realize the potential savings in performing maintenance at longer or optimal intervals.
If you ask a mechanic doing a maintenance task what he or she finds each time a monthly task is performed and the answer is, “Every third inspection, I find some issues,” then the time interval at which the task is performed is too short and a change to every two months may be warranted.
This kind of feedback allows you to narrow down the best time interval for maintenance. By reducing unnecessary actions, you allow the redeployment of those maintenance workers to tasks that would otherwise have to be deferred.
Unlike capital expenditures that have a 1–2-yr payback, investments in a revamped reliability maintenance program can have a 2:1 to as much as a 7:1 return in the first year alone. These savings are typically comprised of reductions in overtime, reductions in equipment expenditures as the emphasis shifts from scheduled replacements to inspections, and an increase in throughput due to a decrease in unscheduled downtime.
On an annual basis, the numbers are usually significant. An optimized reliability maintenance plan makes the entire organization part of the solution — as workers at every level have a role in the evaluation process.
Without a plan or process to manage reliability or reliability maintenance programs, culture change will be impossible. The real gains from instituting programmatic change are not only the short-term gains associated with the so-called low-hanging fruit, but also the long-term gains that come from changing the way the organization approaches and deals with maintenance. When you change behavior, you get a long-lasting and sustainable benefit, which allows you to continuously improve. Conducting a formal assessment and evaluation like the one outlined in this article is a good first step.