TABLE OF CONTENTS

Industry/News Company Updates Best Practices and How To Languages & Technologies Product Customer Stories

DORA Metrics Explained: The Five Measures of Software Delivery Performance

William Sigsworth

If you've spent any time around engineering leadership, you've probably heard someone mention DORA metrics. They're the closest thing the software industry has to a shared scoreboard for delivery performance, having originated from Google Cloud's DevOps Research and Assessment programme, which has spent over a decade studying what separates high-performing technology teams from the rest.

The framework originally launched with four key metrics—and most people still encounter it that way—but now includes five.

Instead of arguing about whether a team "feels" fast or stable, DORA metrics give engineering leaders and platform teams a small set of numbers to point to.

They don't replace judgement, but they replace guesswork with something measurable, making conversations between engineering and the rest of the business a lot more productive.

This article covers what the DORA metrics are, how to calculate them, what good performance looks like, and some practical ways feature flags can help teams move two of them in the right direction.

What are DORA metrics?

DORA metrics are five measurements used to assess the software delivery process, capturing two things at once: how fast a team ships changes, and how stable those changes are once they're out.

They were developed by the DevOps Research and Assessment group, now part of Google Cloud, and have become one of the standard reference points engineering teams use to talk about delivery performance and make sure they’re moving fast without breaking things.

The five metrics are split into two categories.

Change lead time, deployment frequency, and failed deployment recovery time measure throughput—how quickly a team can move changes into production.
Change fail rate and deployment rework rate measure instability—how well deployments go once they're out.

The combination is more important than any single number: a team that ships constantly but breaks things every other release isn't actually performing well, and neither is a team that ships rarely but cautiously.

Implementing DORA metrics improves teams because they make you look at speed and stability together, rather than letting one mask problems in the other.

The five DORA metrics

Deployment frequency

Deployment frequency measures how often an organisation successfully releases code to production.

Elite performers deploy on demand, often multiple times a day, treating deployment as a routine, low-drama event. Lower-performing teams might deploy monthly, or even less often, usually because each release requires significant manual coordination, sign-off, or testing effort.

This metric is really a proxy for something deeper: how much friction sits between the code being "ready" and the code being live.

High deployment frequency combined with solid stability metrics means the path to production is short, automated, and trusted enough that nobody needs to hold their breath every time something ships.

By comparison, a team deploying constantly with a high change failure rate and slow recovery time is just shipping problems faster.

Lead time for changes

Lead time for changes is the time it takes from a code commit being made to that code running successfully in production. It's one of the cleanest ways to measure how efficiently a team moves from idea to delivery, because it captures everything in between: code review, testing, approvals, and the deployment pipeline itself.

A short lead time usually points to a healthy development process: small batches of work, fast code reviews, and a deployment pipeline that doesn't require a human to babysit it.

A long lead time, on the other hand, often signals bottlenecks that have nothing to do with how good the code is—things like slow review cycles, manual testing gates, or infrequent release windows.

Change failure rate

Change failure rate is the percentage of deployments that cause a failure in production, requiring a hotfix, rollback, or patch. It's the natural counterbalance to deployment frequency: shipping often is only a good thing if what you're shipping mostly works.

This metric keeps deployment frequency honest. A team could technically increase how often it deploys by cutting corners on testing, but that would show up immediately as a rising change failure rate.

Reading the two metrics together is what makes DORA metrics useful, rather than just a vanity measure of how busy a team looks.

Failure recovery time

Often called time to restore service, this metric measures how long it takes a team to recover when a production failure happens.

It's a measure of resilience rather than prevention: every team will eventually ship something that breaks, and this metric asks how effectively they handle it when that happens.

Teams with fast recovery times tend to have good monitoring, clear ownership of incidents, and a quick path to either roll back or disable the problematic change. Teams with slow recovery times are often missing one of those three things, which turns small mistakes into prolonged outages.

Deployment rework rate

Deployment rework rate, the newest DORA metric, measures the ratio of deployments that are unplanned but happen as a result of an incident in production.

Where change failure rate captures deployments that immediately require intervention, rework rate captures the reactive deployment work that follows—the patches, fix-forwards, and emergency releases that consume engineering time that was supposed to go elsewhere.

A high rework rate is a signal that a meaningful proportion of deployment activity is driven by firefighting rather than feature delivery. It sits alongside change failure rate as the second instability metric, the two together providing a clearer picture of how much of a team's shipping capacity is eaten by fixing the consequences of previous releases.

Why DORA metrics are important

The five software delivery performance metrics capture speed and stability at the same time, without forcing teams to trade one off against the other.

By splitting the process into five metrics, engineering teams can identify bottlenecks in the software development process and potential issues in the production environment.

For example, if you’re launching software at a good pace without causing many failures in production, but the one time you do cause a failure, it takes too long to fix, the issue is in your failure recovery rate.

DORA's research shows that these performance metrics predict better organisational performance and wellbeing for team members, which is a useful thing to be able to say to a sceptical stakeholder who thinks metrics like this are just an engineering vanity project.

One of the key findings from the research is that elite performing teams tend to score well across all five metrics rather than excelling at just one.

Top performers do well across all five metrics, and low performers do poorly, which suggests that speed and stability genuinely reinforce each other in practice, rather than living in permanent tension.

That single point is probably the most useful thing to take away from the whole framework: a team that's "fast but reckless" or "careful but slow" usually isn't actually elite by either measure once you look closely.

For engineering managers, this gives a genuinely useful shared language with the rest of the business. Instead of saying "the team is doing well," you can point to specific, comparable numbers that mean roughly the same thing across different organisations and industries.

How to calculate DORA metrics

You don't need an enormous amount of infrastructure to start measuring DORA metrics, but you do need data from a handful of different places, since deployment, change, and incident data usually live in separate systems.

In practice, that means pulling from your CI/CD pipeline for deployment timestamps, your version control system for commit history (to calculate lead time), and your incident management tools for recovery records.

None of these are hard to track individually, but stitching them together so the numbers line up correctly usually takes the most setup effort, since deployment, change, and incident data are typically scattered across several different tools.

You don't need perfect precision to get started. A rough baseline, even one built from a few weeks of manual data collection, is enough to see roughly where you stand and start tracking whether things are improving.

DORA performance tiers

DORA's research has historically grouped teams into four performance tiers—elite, high, medium, and low—based on how they score across the metrics.

The most recent year this model was published in full, the 2024 Accelerate State of DevOps Report, found that elite performers made up roughly 19% of respondents, deploying on demand with under a day's lead time and recovering from failures in under an hour.

If you're benchmarking yourself against these tiers, the boundaries aren't fixed numbers that stay the same every year. They shift based on how the wider industry is performing, since the tiers come from clustering survey responses rather than from predetermined targets.

The 2024 report, for instance, found the high-performing tier had shrunk noticeably compared with the year before, while the low-performing tier had grown, a reminder that industry-wide delivery performance isn't a one-way ratchet upward.

DORA's most recent research has moved away from this four-tier model in favour of a more nuanced set of team profiles that account for human factors like burnout alongside the raw delivery numbers.

The tiers are still a genuinely useful mental model for getting oriented, but treat the exact thresholds as a snapshot from a particular year rather than a permanent yardstick.

How feature flags support DORA metrics

Feature flags are a complement to good DevOps practice, not a replacement for the underlying engineering work that DORA metrics actually measure.

A team with a slow, manual release process won't fix that by bolting on a feature flagging tool. However, for two of the five metrics specifically, flags genuinely help.

The mechanism is decoupling deployment from release.

Feature flags enable teams to push code to production without immediately switching it on for every user. That separation supports higher deployment frequency, as shipping code no longer has to mean exposing it to everyone at once, lowering the perceived risk of each individual deployment.

That same wrapping mechanism helps with change failure rate, too. Risky or incomplete changes can be rolled out gradually, perhaps to a small percentage of users first, which means that if something does go wrong, it affects a much smaller slice of the user base before anyone notices and catches it.

Recovery time is where the effect is most direct. When something goes wrong with a flagged feature, switching it off is typically a matter of seconds, not a full rollback or redeploy. That speed difference improves failure recovery time specifically, since the whole point of that metric is how quickly a team can get back to a stable state.

Lead time for changes is the one metric flags don't meaningfully move on their own. That number is mostly a function of code review speed, testing rigour, and how the development process itself is structured, none of which a feature flag changes directly.

Flagsmith is a feature flag management platform built around this kind of progressive rollout and kill switch functionality, letting teams wrap changes in flags and adjust who sees them without needing a new deployment.

Common challenges when adopting DORA metrics

A few obstacles tend to come up once teams start trying to measure this properly.

The most common is fragmented data. Deployment records, commit history, and incident logs often live in entirely separate tools that were never designed to talk to each other, which makes pulling together an accurate picture more work than people expect going in.

Teams can also end up optimising for the numbers rather than the underlying outcome the numbers were meant to represent.

Setting metrics as a goal, with broad statements like demanding every application deploy multiple times a day by year's end, increases the likelihood that teams will try to game the metrics rather than genuinely improve.

If deployment frequency becomes a target in itself, you can hit the number by deploying trivial, low-value changes more often without actually getting better at anything that matters.

DORA metrics are diagnostic tools for finding where to focus improvement effort, not a scorecard for ranking individual engineers or comparing unrelated teams against each other. Used that way, they tend to create exactly the kind of defensive, blame-driven culture that hurts the stability metrics they're supposed to track.

Conclusion

DORA metrics give engineering teams a genuinely useful, shared way to measure and improve both delivery speed and stability, rather than treating the two as competing priorities.

Deployment frequency and lead time for changes tell you how quickly ideas reach users. Change failure rate, deployment rework rate, and failure recovery time tell you how well those changes hold up once they're there.

If deployment frequency and change failure rate are two numbers you're looking to improve, feature flags are a practical place to start. Flagsmith lets teams decouple deployment from release and switch problematic changes off in seconds rather than hours, which helps to get those two specific metrics moving in the right direction.

Try Flagsmith for free to see how progressive delivery and kill switches fit into your existing deployment process.

DORA metrics FAQs

What is a good DORA metrics score?

There isn't a single universal "good" score, since DORA groups teams into performance tiers rather than setting one fixed target.

Broadly, elite performance means deploying on demand, recovering from failures in under an hour, and keeping change failure rate low. The more useful question for most teams is, “Are we improving year over year?"

Who created DORA metrics?

DORA metrics were created by the DevOps Research and Assessment team, founded by Dr Nicole Forsgren, Gene Kim, and Jez Humble. The group is now part of Google Cloud, and their research has been published annually for over a decade.

How often should DORA metrics be measured?

Most teams benefit from tracking DORA metrics continuously, since deployment and incident data accumulate naturally as part of normal operations.

Reviewing trends monthly or quarterly is usually more useful than reacting to single-day spikes, since the real value is in spotting whether performance is moving in the right direction over time.

About the author

Head of Organic Growth at Flagsmith

June 30, 2026

Explaining The Ring Deployment Model: Safer Releases, Ring by Ring

William Sigsworth

June 24, 2026

Feature Flags in DevOps: What They Are, Why You Need Them

Asaph Kotzin

June 22, 2026

What Is a Dark Launch? The Ultimate Software Development Guide

William Sigsworth

June 15, 2026

What Is Product Lifecycle Management?

William Sigsworth

June 9, 2026

What GitLab Feature Flags Can Do for Your Release Workflow

William Sigsworth

June 3, 2026

The Engineering Team's Guide to Release Strategies That Actually Work

William Sigsworth

June 1, 2026

You Can Now Integrate Flagsmith with GitLab! Here's How You Do It

Asaph Kotzin

May 27, 2026

The Benefits of A/B Testing, and Why Feature Flags Make It Even Better

William Sigsworth

May 20, 2026

The Developer's Playbook for Beta Testing That Actually Works

William Sigsworth

May 20, 2026

Code References: See Exactly Where Your Feature Flags Live in Your Codebase

Evandro Myller

May 18, 2026

What Is Blue-Green Deployment? The Complete Guide

William Sigsworth

May 12, 2026

Smoke Testing Explained: Catch Build Failures Before They Reach Your Users

William Sigsworth

May 7, 2026

When Canary Alerts Go Wrong: How We Fixed It and Doubled Down on OSS

Kim Gustyr

May 6, 2026

Release Testing: A Complete Guide for Development Teams

William Sigsworth

May 5, 2026

What Is a Kill Switch in Software and Why Do Developers Need Them?

William Sigsworth

April 29, 2026

How to Implement CI/CD: A Practical Implementation Guide

William Sigsworth

April 27, 2026

What Is CI/CD? A Plain-English Guide to Faster, Safer Software Delivery

William Sigsworth

April 21, 2026

Rolling Deployment Vs. Blue-Green: Which Strategy Fits Your Pipeline?

William Sigsworth

April 20, 2026

What Is Feature Management and Why Does It Matter?

William Sigsworth

April 15, 2026

What Is Trunk-Based Development? A Complete Guide

William Sigsworth

April 13, 2026

Deployment Frequency: The Metric That Reveals How Fast Your Team Really Ships

William Sigsworth

April 9, 2026

OpenTelemetry, without the vendor lock-in: Introducing full observability for Open Source and Self-Hosted Flagsmith customers

Kim Gustyr

April 7, 2026

How to Migrate from LaunchDarkly to OpenFeature in 6 Steps

Tanaaz Khan

March 31, 2026

How Prometheus, Flagsmith, and Some Good Old-Fashioned Compression Helped Us Solve Customer Pain

Matt Althauser

March 30, 2026

Feature Flag Testing: How Enterprise Teams Build Real Product Learning Loops

Asaph Kotzin

March 26, 2026

Trunk-Based Development vs. Gitflow: Choosing the Right Branching Strategy

Mia Loiselle

March 25, 2026

Why OpenAI Paid $1.1 Billion for a Feature Flag Company

Matthew Elwell

March 20, 2026

The Engineering Leader's Guide to Scaling Feature Flags

Tanaaz Khan

March 19, 2026

6 Tips to Reduce and Manage Technical Debt in 2026

Tanaaz Khan

February 24, 2026

Three teams. Eight hours. Three amazing features: Flagsmith’s 2026 Lisbon Offsite and Hackathon

Adrian Gregory

February 17, 2026

Vibe Coding and Feature Flags: The New PM Playbook for Faster Product Validation

Asaph Kotzin

February 9, 2026

10 Best Practices to Build and Ship AI Features With Minimal Risk

Tanaaz Khan

January 29, 2026

Tracking Feature Flag Changes and Evaluation with Flagsmith and Sentry

Daniel Efe

November 28, 2025

We Built Our Own MCP Server for Engineers & Release Managers

Adrian Gregory

November 21, 2025

7 PostHog Alternatives for Feature Flag Management

Tanaaz Khan

November 12, 2025

Why LaunchDarkly Went Dark During the AWS Outage—And Why Flagsmith Didn’t

Matthew Elwell

November 7, 2025

Statsig Alternatives: 8 Best Feature Flag Platforms Compared

Tanaaz Khan

November 5, 2025

Integrating Datadog Workflows with Flagsmith for Automated Reliability

Daniel Efe

October 24, 2025

Progressive Delivery for Building LLM-Powered Features

Pete Hodgson

October 23, 2025

What is the Four Eyes Principle? A Developer's Guide to Safer Flag Changes

Tanaaz Khan

October 17, 2025

De-Risking AI Adoption: How Feature Flags Help Enterprises Move Fast Without Breaking Trust

Adrian Gregory

October 7, 2025

Monitoring Feature Flag Performance with Flagsmith, Prometheus, and Grafana

Daniel Efe

September 25, 2025

What is Release Management and How Does it Work in Regulated Industries?

Tanaaz Khan

September 17, 2025

Banking and Modern Observability: Dynatrace Insights

Andreas (Andi) Grabner

September 4, 2025

No More Hardening Phases: Testing in the Age of Continuous Deployment

Pete Hodgson

September 1, 2025

How Modernisation is Changing Open Source Banking

Rob Moffat

August 5, 2025

Use Grafana to Track Feature Health in Flagsmith

Mia Loiselle

August 28, 2025

6 Lessons From the World's Best Open-Source Founders

Ben Rometsch

August 27, 2025

Feature Toggles and Feature Flags: Understanding the Key Differences

Tanaaz Khan

August 25, 2025

8 Types of Deployment Strategies (And How Feature Flags Help)

Ben Rometsch

July 31, 2025

Moving to Progressive Delivery with Feature Flags

Ben Rometsch

July 11, 2025

Top 7 Feature Flag Tools for Enterprises in 2026

Tanaaz Khan

June 3, 2025

Moving Fast, Without Breaking Things: Modern Software Delivery with Feature Flags

Pete Hodgson

June 4, 2025

TypeScript Feature Flags: A Next.js Example

Michael Dinerstein

May 14, 2025

Embracing Modernisation in Banking Through Platform Engineering

Benjamin Brial

May 9, 2025

Transitioning to Modern Authorisation Management

Alex Olivier

April 22, 2025

What Are Feature Flags? Everything Engineering Teams Need to Know

Ben Rometsch

April 7, 2025

A Conversation with Komerční Banka's Chief Software Architect

Mia Loiselle

March 26, 2025

GitOps for Feature Flags Using Terraform and Terrateam

Malcolm Matalka

March 25, 2025

Why It’s Time to Test in Production: Best Practices

Tanaaz Khan

January 22, 2025

How We Improved Our Docker Image Security Using Chainguard's Wolfi

Kim Gustyr

January 7, 2025

6 Best Enterprise-Grade Harness Alternatives & Competitors

Tanaaz Khan

October 28, 2024

How to Roll out Pricing Changes With Zero Customer Complaints

Matthew Elwell

September 16, 2024

How to Use Feature Flags for Trunk-Based Development

Kyle Johnson

August 21, 2024

7 Best LaunchDarkly Alternatives & Competitors

Tanaaz Khan

August 12, 2024

How Global Banks Use Feature Flags to Stay Competitive

Tanaaz Khan

July 24, 2024

How To Guide: Flagsmith Grafana Integration

Pradumna Saraf

July 23, 2024

New in Flagsmith: 2024 Feature Roundup

Matthew Elwell

July 23, 2024

Don’t Let a Flawed Release Take Your Company Down

Ben Rometsch

June 26, 2024

How to Guide: Flagsmith GitHub Integration

Pradumna Saraf

May 28, 2024

6 Best Firebase Remote Config Alternatives & Competitors

Tanaaz Khan

May 16, 2024

How to Transition to Modern Feature Management in Banking

Ben Rometsch

March 21, 2024

5 Feature Flag Management Pitfalls To Avoid To Keep Your Flags in Check

Tanaaz Khan

February 29, 2024

The Best Thing about Founding a Remote-First Company? Pickled Onion Monster Munch and The Beautiful Game

Ben Rometsch

February 28, 2024

Flagsmith Jira Integration Guide: A Comprehensive How-to Guide

Abhishek Agarwal

February 16, 2024

Guide: How to Create Observability-Driven Development with Feature Flags

Savan Kharod

January 31, 2024

Build vs. Buy for Feature Flags: My Experience as a CTO with a 20+ Engineer Team

Daniel Engelke

January 16, 2024

Announcing the Flagsmith Referral Programme

Anna Redbond

January 15, 2024

How We Measure Feature Flags’ Success

Kyle Johnson

December 20, 2023

Customer Story: Serenis

Anna Redbond

December 7, 2023

Announcing the Flagsmith Jira Integration

Anna Redbond

June 6, 2024

Spring Boot Feature Flags: A Step-by-Step Implementation Guide with a Working Java Spring Boot Application

Abhishek Agarwal

November 22, 2023

Employees on Bootstrapping

Anna Redbond

November 14, 2023

Our POV: When Bootstrapping Works (and When It Doesn't)

Anna Redbond

October 25, 2023

How to Onboard Feature Flag Management Tools

Anna Redbond

October 12, 2023

When is it time to move to feature flag software?

Olga Diaz

September 26, 2023

Why We Bootstrap

Ben Rometsch

September 6, 2023

The Enshittification of Basically all Digital Design. But in this Case, Specifically, the Slack Redesign.

Ben Rometsch

January 9, 2025

Ruby Feature Flags: A Step-by-Step Guide to Implementing Feature Flags in a Ruby on Rails Application

Zeeshan Afridi

September 1, 2023

Unlocking Efficiency: Transitioning to Modern CI Processes

Geshan Manandhar

August 29, 2023

Customer Story: Vontobel

Anna Redbond

August 17, 2023

It's Time to Move to Modern Observability Tools and Progressive Delivery: Insights from Dynatrace

Andreas (Andi) Grabner

August 2, 2023

Moving to Modern Software Development and Continuous Integration for Banks: Insights from Romano Roth (Zühlke)

Anna Redbond

August 1, 2023

Developer-Led Podcast: Bootstrapping a Commerical Open Source Company to $1M ARR

Anna Redbond

July 24, 2023

Open Source Startup Podcast: Why Feature Flagging Should be Open Source with Ben Rometsch

Anna Redbond

July 20, 2023

Get The Analytics You Need: A/B Testing with Feature Flags and Your Existing Stack

Kyle Johnson

July 18, 2023

Open-Source in Banking: Rob Moffat from FINOS Talks Barriers, Benefits, and Pushing the Battleship to Adoption

Anna Redbond

June 30, 2023

Customer Story: Rain (VP of Platform Engineering)

Anna Redbond

June 30, 2023

Customer Story: Rain (Tech Lead)

Anna Redbond

September 26, 2024

PHP Feature Flags: A Step-by-Step Guide in a Working Laravel Application

Geshan Manandhar