Making sense of maintenance metrics: MTBF. Everything you need to know about mean time between failures

July 25, 2019

| 5 min read

Making sense of maintenance metrics: MTBF

Welcome to our series of blog posts about maintenance metrics. This post outlines everything you need to know about mean time between failures (MTBF): The MTBF formula, how it’s different from mean time to failure and mean time to defect, and how to improve MTBF. Click here to see the rest of the series.

Table of contents

What is MTBF?

The best-before date on food containers has probably saved more than a few people from a bout of food poisoning. Sure, you could give that milk a sniff or submit your yogurt to the eye test, but that’s no guarantee. It’s best to trust the numbers put there after thousands of tests by trained experts and plan to pick up a replacement before you actually need it.

Failure isn’t black and white. It occurs in multiple stages…A car can still drive with a flat tire and a computer can still function with a few letters missing from the keyboard. But they don’t work the way they’re supposed to. This is failure.

Mean time between failures is your maintenance department’s best-before date for equipment. MTBF calculates the average period between two breakdowns. In other words, it’s a measure of reliability—how long an asset typically works until it goes caput. It helps you make data-driven decisions on maintenance scheduling, safety, inventory management, and equipment design without relying on subjective observations.

The essential guide to mastering maintenance metrics (with a cheat sheet!)

Get the guide

MTBF formula

The MTBF formula divides an asset’s total number of operational hours in a period by the number of failures that occurred on that asset in that period. MTBF is most often measured in hours.

Before you start calculating MTBF, you need to understand failure. When most people think of failure, they think of a completely broken machine—a car unable to drive or a computer with a blank screen. However, failure isn’t black and white. It occurs in multiple stages. Basically, failure is when a system or part no longer produces exact and required results. A car can still drive with a flat tire and a computer can still function with a few letters missing from the keyboard. But they don’t work the way they’re supposed to. This is failure. In manufacturing, failure might look like a machine that is unable to meet the required level of production per minute, per shift, or per day because of a problem with one of its parts.

The MTBF calculation takes into account all types of failure as defined above, but it does not use scheduled maintenance, like inspections, recalibrations, or preventive parts replacements.

MTBF = Number of operation hours ÷ Number of failures

For example, let’s say you have a 10 identical pumps at your facility. The pumps operated for 100 hours each over the course of a year, totalling 1,000. The pumps failed 16 times in total over that year. This means that the mean time between failures for these pumps is 62.5 hours.

MTBF = (10 pumps x 100 operational hours each) ÷ 16 failures

MTBF = 1,000 operational hours ÷ 16 failures

MTBF = 1,000 ÷ 16

MTBF = 62.5 hours

World-class MTBF is difficult to establish as every piece of equipment has different expectations. It’s best to investigate each asset’s MTBF and the standard for your industry before setting a benchmark for your team.

How to improve MTBF

Failure leads to two things: lost production and more maintenance. Both mean higher costs and less money going to the bottom line. That’s why reducing instances of failure is crucial for manufacturers and why figuring out how to improve MTBF should be one of your top priorities. Fortunately, there are a number of ways to chip away at the root cause of failure and boost your MTBF in the process.

Improve preventive maintenance processes

If done well, preventive maintenance has the potential to drastically increase MTBF. Being proactive can stop equipment issues before they even begin. However, if the process is lacking, PMs can have the opposite effect. If manuals are missing, checklists are vague or non-existent, or if technicians aren’t trained properly, preventive maintenance can actually lead to a speedier breakdown. To improve your PM process, focus on providing the right resources to your maintenance team and making these resources as accessible as possible.

Conduct a root cause analysis

Understanding everything you can about why an asset failed can help you prevent that failure from happening again, or, at least, happen less often. The best way to get to the bottom of failure is through a root cause analysis using the 5 Whys approach. This allows you to move past just fixing an immediate problem and towards a long-term solution. Instead of just replacing a defective part, you can understand if a higher-quality part can be ordered, and why that part wasn’t ordered in the first place. Not only does this improve MTBF on one asset, but it has the potential to improve MTBF across the board by creating better processes.

Tracking MTBF helps you make data-driven decisions on maintenance scheduling, safety, inventory management, and equipment design without relying on subjective observations or often-inaccurate manuals.

How to find the real reason your equipment failed

Guide to RCA

Work towards condition-based maintenance

If you can build an early-warning system to find equipment issues before they lead to failure, you can increase MTBF and reduce unplanned downtime. Another word for this warning system is condition-based maintenance. It can be a long road to establishing condition-based maintenance (CBM) at your facility, but there are some building blocks you can begin to implement right away. These steps include establishing total productive maintenance (TPM), defining failure modes, charting the P-F curve for critical assets, and connecting your maintenance software to other technologies like PLCs, SCADA, and equipment sensors.

MTBF vs. MTTF vs. MTTD: What’s the difference?

Mean time between failures sounds a lot like mean time to fail (MTTF) and mean time to defect (MTTD). But they’re not the same. Understanding the difference between these three maintenance metrics and when to use each is important for creating a data-based maintenance strategy.

What is MTTF?

Mean time to fail (MTTF) measures the average time from when a non-repairable asset begins operating to when it fails. The key phrase is ‘non-repairable asset.’ MTTF calculates the entire lifespan of equipment. When it fails, it’s replaced. This is how MTTF differs from MTBF — the former deals with assets that aren’t repairable and the latter deals with assets that are repairable.

There are a number of reasons an asset might not be repaired, but the most common rationale is that it costs less and takes less time to replace the asset. For example, replacing a motor that costs a few hundred dollars in parts and labour is probably more cost-effective than removing that motor and trying to rebuild it. It just isn’t worth the effort or the cost.

MTTF is calculated by dividing the number of hours assets have operated by the total number of assets being tracked. For example, three identical fans run for a total of 60 hours. The MTTF for these fans would be calculated by dividing 60 by three for a total of 20 hours. This calculation can tell you many things—the most valuable being how often maintenance is required and when to purchase inventory.

What is MTTD?

Mean time to defect (MTTD) measures the average time between identifiable issues that lead to equipment failure. An asset can continue to run after a defect is discovered. That’s what makes MTTD and MTBF different: MTTD is a prelude to failure, while MTBF is the state of failure. For example, the temperature of a motor may rise above an acceptable level, but the motor keeps going for two more hours until it breaks.

The MTTD formula divides an asset’s total number of defects by its total number of operational hours. For example, if a piece of equipment runs for 80 hours and you identify five defects in that time, the mean time to defect is 16 hours.

MTTD is used primarily to schedule preventive maintenance and establish condition-based maintenance for assets. If you are able to track defects and understand the failure modes of an asset, you have most of the tools necessary for a P-F curve. The P-F curve is a key part of condition-based maintenance and allows you to schedule maintenance on an asset at the best possible time without using unnecessary resources.

Boost your business with MTBF

Failure isn’t just an issue for the maintenance department. It affects the whole organization, especially when it comes to its bottom line. However, the maintenance team are almost always the ones charged with preventing breakdowns. Tracking mean time between failures can be a great weapon in this battle against unplanned downtime. It can help you maximize your maintenance and get the most from your processes so you can stop feeling the pain of failure.

Get a nine-step plan for modernizing maintenance

See it in The Business Leader's Guide to Digital Transformation in Maintenance

Download the guide

Business leader's guide to digital transformation

Want to see Fiix in action?

No problem. You can try it today.

Free tour

fiix dashboard screenshot