Mean time to detect (MTTD)
What is MTTD and how to use itDownload your guide to using maintenance metrics
What is mean time to detect (MTTD)?
Mean time to detect (MTTD) is one of the main key performance indicators in incident management. It measures the average period between the beginning of an incident and the amount of time it takes the organization to identify the issue.
MTTD is used by organizations to gauge the effectiveness of an individual or team's monitoring and management systems, and the communication processes between users, customers, and those in charge of fixing the problem. Problem detection can be done by people -- such as end users reporting a software outage -- or by systems monitoring and management tools.
MTTD is a good way to test the effectiveness of a new tool or operational process. Combined with other metrics such as mean time to repair, it shows the overall timeline of incident response. The sooner an organization finds out about a problem, the better. Assets are cheaper to fix the sooner you find them, and any unplanned downtime results in more financial loss.
Table of contents
How to calculate MTTD:
MTTD can be calculated by adding up all the times between failure and detection, and dividing them by the number of failures. This MTTD can then be compared to a previous time period to gauge performance.
MTTD = total time between failure & detection / # of failures
When identifying failures, some organizations omit outliers that might skew the mean. Tiering incidents by severity is also useful for prioritizing incidents. Calculating separate MTTDs can help determine how to make the best use of resources when resolving issues. For example, if the MTTD for security problems is high, it would be a greater priority to fix versus a high MTTD for minor production issues.
Why keeping a low MTTD is important
The sooner you learn about issues inside your organization, the sooner you can fix them. A lower MTTD means you’re discovering and solving problems quickly. Fixing problems as quickly as possible not only stops them from becoming an even bigger issue, but it’s also easier and cheaper than waiting until it’s too late.
Additionally, tracking and improving your organization’s MTTD can be a great way to evaluate the effectiveness of your incident management processes, including your log management and monitoring strategies. A low MTTD reflects strong incident management capabilities while a high MTTD means the monitor approach is lacking, causing for discovery of incidents to take longer than needed.
People are always the first line of defense when it comes to reducing MTTD. Across the organizational layers, key stakeholders need to deeply understand the processes and technologies involved in incident management in order to detect and respond to failures quickly.