Why DORA Metrics Are Crucial for High-Performing Tech Teams

Giulio Rusciano
8 min readJun 6, 2023

In 2018 the DevOps Research and Assessment (DORA) group published its “State of DevOps” report with insights from analyzing survey responses from over 31,000 professionals worldwide over six years. It identified four key metrics for measuring DevOps performance, known today as the DORA metrics:

  • Deployment frequency (DF)
  • Change Lead Time (CLT)
  • Change failure rate (CFR)
  • Mean time to recover (MTTR)

The DORA metrics not only provide a framework for DevOps teams to measure their performance and achieve success but also offer a way for engineering leaders to assess the effectiveness of their software development and delivery processes. Optimizing for these metrics, teams can improve their software development practices and accelerate software delivery. DORA’s data-driven insights and best practices for DevOps operations can help organizations to achieve these goals and improve their software delivery outcomes. In addition, DORA’s identification of performance benchmarks for each metric allows organizations to understand where their performance stands in comparison to Elite, High-Performing, Medium, and Low-Performing teams, providing a valuable benchmark for improvement. Although no longer independent (having been acquired by Google in 2018) DORA remains dedicated to providing these insights and best practices to help organizations of all sizes and types achieve DevOps success.

Photo by Adrien Milcent on Unsplash

Why DORA metrics are so important

To ensure consistent and effective measurement of DevOps team performance, a standardized framework is essential. In the past, organizations often created their own metrics, making it challenging to benchmark performance, compare teams, or track progress over time.

The DORA metrics offer a clear and widely accepted framework that enables DevOps and engineering leaders to measure software delivery speed and quality. By providing a standard set of metrics, development teams can gain insight into their current performance and identify areas for improvement. These metrics also allow leadership to evaluate the performance of their organization’s DevOps practices, provide reports to management, and make data-driven decisions to optimize their processes.

Additionally, the DORA metrics can help organizations ensure they are meeting customer requirements. Using these metrics, development teams can improve the quality and speed of their software releases, resulting in more significant business value.

The DORA benchmarks provide specific objectives that can be broken down into metrics to track key results. Having concrete goals to work towards and more importantly measurable evidence of progress can motivate teams to continue striving towards those goals.

Moreover, DORA metrics offer valuable insights into team performance. For example, monitoring Change Failure Rate and Mean Time to Recover can help leaders ensure that their teams are building robust services with minimal downtime. Similarly, tracking Deployment Frequency and Mean Lead Time for Changes can provide metrics that enable us to understand if the team is working efficiently.

Taken together, the metrics can offer a balanced view of the team’s speed and quality.

1. Deployment Frequency (DF)

Deployment Frequency (DF) is a metric that measures the rate at which code is successfully deployed to a production environment over a period of time. It is a crucial measure of a team’s average throughput and can be used to evaluate how often an engineering team is delivering value to customers. High-performing DevOps teams prioritize frequent, smaller deployments over large releases deployed during fixed windows. These teams deploy at least once a week, with top-performing teams deploying multiple times per day.

Low performance on this metric can be an indication of the need for improvements in automated testing and validation of new code. Other areas to focus on include breaking changes into smaller chunks and creating smaller pull requests (PRs) or improving overall Deploy Volume. Mean Lead Time for Changes (MLTC) measures the time it takes for a change to reach a production environment from the first commit in a branch. It is a useful metric for understanding the efficiency of the development process, with peak performers able to go from commit to production in less than a day.

Engineering leaders should have an accurate understanding of how long it takes their team to get changes into production, as deployments can be delayed for various reasons, such as batching up related features and ongoing incidents. To improve on this metric, leaders can analyze metrics corresponding to the stages of their development pipeline, such as Time to Open, Time to First Review, and Time to Merge, to identify bottlenecks in their processes. Teams can consider breaking work into smaller chunks to reduce the size of PRs, enhancing the efficiency of their code review process, or investing in automated testing and deployment processes.

How to measure Deployment Frequency

While deployment frequency is a straightforward metric to track, it can be challenging to analyze the data. Simply looking at the volume of deployments per day or week can give a misleading picture of deployment frequency, since it doesn’t take into account how many unique changes are being deployed. To address this, the DORA Group recommends categorizing deployment frequency into buckets based on the number of successful deployments per week. For example, if an organization has a median of more than three successful deployments per week, they fall into the Daily deployment bucket. If they deploy successfully on more than five out of 10 weeks, they would fall into the Weekly deployment bucket.

It’s important to note that determining what constitutes a successful deployment can be complex and varies by organization. For instance, if a canary deployment is only exposed to 5% of traffic, some may consider it successful, while others may not. Similarly, if a deployment runs without issues for several days before encountering an error, opinions may differ on whether it should be counted as successful. Ultimately, the definition of a successful deployment depends on the organization’s specific goals and objectives.

2. Change Lead Time (CLT)

Change Lead time is a critical metric for DevOps teams as it measures the duration from when a change request is received until it is delivered to the customer in production. Longer lead times (typically measured in weeks) can indicate process inefficiencies or bottlenecks in the development or deployment pipeline, leading to delayed releases and slower feedback loops. On the other hand, shorter lead times (typically around 15 minutes) suggest a streamlined and efficient development process, which enables teams to deliver value to customers faster and more frequently. By tracking lead time, DevOps teams can gain insights into the effectiveness of their development process and identify areas for improvement to optimize their software delivery pipeline.

How to measure Change lead time

To accurately measure change lead time metrics, it is essential to track two important pieces of data — the time when the commit is made and the time when the deployment including that commit is completed. This information helps to determine how quickly code changes are being delivered to production.

To calculate these metrics, it is necessary to maintain a list of all the changes that are included in each deployment. Each change should be linked back to a specific commit using its SHA identifier. By keeping track of this data, you can effectively join the changes table, compare timestamps and calculate the lead time.

Additionally, it is important to ensure that this data is consistently and accurately tracked to avoid any inaccuracies or errors in the lead time calculation. This will help teams to understand how quickly code changes are being delivered to production, and identify opportunities for improvement in the development process.

3. Change Failure Rate (CFR)

The Change Failure Rate (CFR) is a crucial metric that measures the quality of code being deployed to production environments. It calculates the percentage of deployments that result in a failure or production incident, indicating the amount of time the team spends fixing failures. To give engineering leaders insight into the effectiveness of their DevOps team, the ultimate goal is to achieve a change failure rate between 0% and 15%. High-performing teams are known to achieve this goal, according to DORA research.

Since software bugs are inevitable, it’s important to avoid blaming individuals or teams when these incidents occur. However, leaders must still monitor the frequency of these incidents. This metric is a valuable counterpoint to metrics like DF and MLTC, which only measure speed and throughput. Successful DevOps teams must deliver quality code that is both stable and fast.

To improve this metric, teams can take measures like reducing work-in-progress (WIP) in their iterations, improving code review processes, or investing in automated testing. This metric is crucial for building confidence in software releases and ensuring customer satisfaction.

How to measure Change Failure Rate

To accurately calculate the change failure rate, it is important to define what constitutes a failure in your organization. This definition will depend on the specific goals and context of your team or project. Once you have established your definition, you need to track both the total number of attempted deployments and the number of deployments that meet the criteria for failure. To do this, you can track deployment incidents using a variety of tools, such as a simple spreadsheet, bug tracking system, or a specialized tool like GitHub incidents. It’s essential to map each incident to the ID of the corresponding deployment to identify the percentage of deployments that had at least one incident, which determines the change failure rate. While there is no universal definition of what constitutes a successful or failed deployment, having a clear and agreed-upon definition within your organization will help ensure consistency and accuracy in your metrics.

4. Mean Time to Recover (MTTR)

Mean Time to Recovery (MTTR) is a crucial metric for DevOps teams, as it measures the time it takes to restore a system to its usual functionality following a failure. This metric is important because failures are inevitable, and the speed at which your team can recover from them can have a significant impact on your organization’s profitability and competitive advantage.

Elite DevOps teams are able to recover from failures in under an hour, while other high-performing teams can recover in under a day. In contrast, sub-par teams may take several hours or even days to recover. Improving MTTR requires teams to focus on improving their observability, so that failures can be quickly identified and resolved. It’s also important to have an action plan in place for responders to consult, to ensure that everyone understands the process for addressing failures, and to improve other related metrics like Mean Lead Time for Changes. By taking these steps, teams can work towards achieving a MTTR of around five minutes, which is considered a key indicator of high-performance DevOps.

How to measure Mean Time to Recover

To calculate mean time to recovery (MTTR), you need to track two key timestamps: when an incident is first created and when it is resolved by a new deployment. This data can be retrieved from any incident management system or a simple spreadsheet as long as each incident is linked to a specific deployment. MTTR measures the average time it takes to recover from an incident, and it is a crucial metric to understand the efficiency and reliability of a team’s incident response process.

--

--

Giulio Rusciano

20+ yrs in technology leadership, design, development & entrepreneurship, I'm a creative technologist who blends design & tech to bring innovative ideas to life