TABLE OF CONTENTS

Text Link

Industry/News Company Updates Best Practices and How To Languages & Technologies Product Customer Stories

How Prometheus, Flagsmith, and Some Good Old-Fashioned Compression Helped Us Solve Customer Pain

Matt Althauser

Part of our Tales from the Sprint series.

TL;DR

Problem: Customers wanted to store more data on our edge network, but we’re nearing limits because of Dynamo DB constraints.
Solution: The team explored rearchitecting, but realized that compression could be an effective solution.
Rollout Strategy: Phased Rollout to 10% and gradually increasing based on success of observability metrics
Technology Used: Prometheus metrics, Grafana dashboard, and Flagsmith for feature management.
Initial rollout to 10% of traffic revealed a critical bug. We were able to identify, rollback and re-ship the improvement before any negative impact on our users.

The challenge

In order to provide customers with additional targeting and logic, our product allows you to store everything about your feature flags, segments, and identity overrides in an environment document. These documents exist as a JSON file that lives in DynamoDB. While our customers love this for speed, scale and reliability; it introduces a hard size limit of 400 KB. To ensure customers don’t go near that, we put arbitrary limits and warnings at lower storage levels.

For most customers, this limit is more than enough. They’ll never get anywhere near it. But as organisations grow—more flags, more segments, more complex JSON configuration values—those documents tend to grow too.

While this was something we monitored, the real impact to our team was happening in support.

As a customer reached the artificial limit, they would contact us to see if we can manually raise their limit. Upon request, we would do this for them and a few weeks later, they’d be back again—bad for customers and bad for our team. After enough of these tickets, we realized that we needed a real fix.

The solution: Adding a compression step

As we set out to solve this as a team, our first instinct was to rethink the data architecture. This solution would lead us to decoupling segments from the environment document and rewrite how data is stored. However, this is the kind of project that takes months and carries a significant risk of its own.

After thinking it over, Gagan Trivedi, one of our most tenured backend engineers, pointed out something that became obvious in retrospect: JSON compresses extremely well.

The data is full of repetitive keys, similar structures, and envelope content that gzip handles incredibly efficiently. Rather than restructure everything, why not just compress it?

We tested our solution on the real-world customer that had a 355 KB document. It was nearing our hard cap.

After we compressed it with gzip, it came out at 37 KB, a 90% reduction. The DynamoDB limit that had been a hard constraint became, effectively, a non-issue. The next challenge was rolling it out safely.

The solution’s key success metrics

90%	71%	66%
A 90% reduction in document size, meaning DynamoDB limits are far in the distance.	Weekly read capacity unit consumption fell from roughly 58 million per day to 17 million per day.	DynamoDB read latency has dropped from ~4.7ms to ~1.6ms on average.

The rollout strategy: phased rollout with observability metrics

The plan was to add compression and decompression to critical write and read paths of our architecture, which meant we couldn’t afford to break anything. We had to roll out the change gradually, and we needed additional observability to assess that the solution was, in fact, working for our customers.

Luckily, the team had recently spent time on our Grafana and Prometheus integration.

Although it has strong connections with Grafana, Prometheus is an open standard, which means it works everywhere.

For the compression rollout, we built two custom histograms so every single write was captured:

One tracking document sizes at write time, labelled by whether the document was compressed or not, with buckets running up to the 400 KB DynamoDB limit
One tracking the compression ratio itself

We also built a small estimation utility to calculate what the uncompressed byte count would have been, meaning we could compare the two paths directly, even when we were only compressing a fraction of documents.

The result was a dedicated Grafana dashboard called “DynamoDB Compression Rollout” showing document size distribution as before-and-after heatmaps, compression ratios across the fleet, adoption rate, and, most usefully, Flagsmith annotations marking every time the rollout percentage changed.

You could look at the timeline and see exactly what happened when we bumped from 10% to 20%, or rolled back to zero.

Kim Gustyr, the Engineer who lead the rollout strategy, describes it as “coming up with metrics up front and thinking of your impact up front.”

Annotations denote flag rollout changes

Building the rollout: Flagsmith using Flagsmith

The compression feature was gated behind a Flagsmith feature flag called compress_dynamo_documents, evaluated per organisation using our own SDK in local evaluation mode. By leveraging local evaluation, it meant that we could target specific customers or a percentage of all customers and change that percentage without any redeployment.

The code path was simple:

Check the flag for that organisation on every environment document write.
If enabled, compress and write the compressed document; if not, write as normal. Reads always handle both formats regardless.
The kill switch was instant—flip the flag off, all subsequent writes revert to uncompressed immediately.

Before we touched the rollout percentage at all, we needed to know it was working. Which meant we needed to be able to see it working.

The rollout, including what went wrong

On March 9th, we enabled compression for 10% of SaaS organisations. Within hours, Sentry flagged errors.

A customer had a data inconsistency that existed quietly in their environment and had never caused a problem until compression brought it into contact with strict Pydantic validation.

Immediately, the rollout went back to 0%. No additional deployments. The fix was in production the same day.

On March 10th, we went back to 10% and encountered a different issue this time: the segments endpoint reading compressed binary fields without decompressing them. We rolled back to 0%, fixed it the same day, and based on zero errors, bumped it up to 20%.

By March 12th, we were at 100%!

Learnings

The best part of this all was that the compression bugs during rollout never affected a single customer. Both were flagged and caught within minutes of enabling compression for 10% of organisations.

Without the phased rollout using feature flags, both would have hit 100% of customers simultaneously.

The results of the rollout

The impact beyond fixing the original problem has been substantial.

DynamoDB read latency dropped from around 4.7 ms before the rollout to 1.6 ms in steady state—a 66% reduction and a direct byproduct of smaller documents being faster to retrieve.
Weekly read capacity unit consumption fell from roughly 58 million per day to 17 million per day, a 71% reduction that translates directly to lower infrastructure costs.
The 355 KB document that was close to the DynamoDB limit is now 37 KB.

Crucially, the limits we had imposed on customers—100 features and 100 segments per project in SaaS—will be raised significantly or removed from the UI entirely, because they no longer mean anything for the vast majority of users.

Conclusion

The solution was simple. The confidence to ship it safely was the hard part. Feature flags gave us that confidence.

Flagsmith is a market-leading feature flag software. If you want to use Flagsmith to run your own observability-driven rollouts, in the back end or anywhere else, you can get started for free.

And for those interested, here’s what the entire dashboard looks like:

About the author

Partner at Polychrome.

June 24, 2026

Feature Flags in DevOps: What They Are, Why You Need Them

Asaph Kotzin

June 22, 2026

What Is a Dark Launch? The Ultimate Software Development Guide

William Sigsworth

June 15, 2026

What Is Product Lifecycle Management?

William Sigsworth

June 9, 2026

What GitLab Feature Flags Can Do for Your Release Workflow

William Sigsworth

June 3, 2026

The Engineering Team's Guide to Release Strategies That Actually Work

William Sigsworth

June 1, 2026

You Can Now Integrate Flagsmith with GitLab! Here's How You Do It

Asaph Kotzin

May 27, 2026

The Benefits of A/B Testing, and Why Feature Flags Make It Even Better

William Sigsworth

May 20, 2026

The Developer's Playbook for Beta Testing That Actually Works

William Sigsworth

May 20, 2026

Code References: See Exactly Where Your Feature Flags Live in Your Codebase

Evandro Myller

May 18, 2026

What Is Blue-Green Deployment? The Complete Guide

William Sigsworth

May 12, 2026

Smoke Testing Explained: Catch Build Failures Before They Reach Your Users

William Sigsworth

May 7, 2026

When Canary Alerts Go Wrong: How We Fixed It and Doubled Down on OSS

Kim Gustyr

May 6, 2026

Release Testing: A Complete Guide for Development Teams

William Sigsworth

May 5, 2026

What Is a Kill Switch in Software and Why Do Developers Need Them?

William Sigsworth

April 29, 2026

How to Implement CI/CD: A Practical Implementation Guide

William Sigsworth

April 27, 2026

What Is CI/CD? A Plain-English Guide to Faster, Safer Software Delivery

William Sigsworth

April 21, 2026

Rolling Deployment Vs. Blue-Green: Which Strategy Fits Your Pipeline?

William Sigsworth

April 20, 2026

What Is Feature Management and Why Does It Matter?

William Sigsworth

April 15, 2026

What Is Trunk-Based Development? A Complete Guide

William Sigsworth

April 13, 2026

Deployment Frequency: The Metric That Reveals How Fast Your Team Really Ships

William Sigsworth

April 9, 2026

OpenTelemetry, without the vendor lock-in: Introducing full observability for Open Source and Self-Hosted Flagsmith customers

Kim Gustyr

April 7, 2026

How to Migrate from LaunchDarkly to OpenFeature in 6 Steps

Tanaaz Khan

March 30, 2026

Feature Flag Testing: How Enterprise Teams Build Real Product Learning Loops

Asaph Kotzin

March 26, 2026

Trunk-Based Development vs. Gitflow: Choosing the Right Branching Strategy

Mia Loiselle

March 25, 2026

Why OpenAI Paid $1.1 Billion for a Feature Flag Company

Matthew Elwell

March 20, 2026

The Engineering Leader's Guide to Scaling Feature Flags

Tanaaz Khan

March 19, 2026

6 Tips to Reduce and Manage Technical Debt in 2026

Tanaaz Khan

February 24, 2026

Three teams. Eight hours. Three amazing features: Flagsmith’s 2026 Lisbon Offsite and Hackathon

Adrian Gregory

February 17, 2026

Vibe Coding and Feature Flags: The New PM Playbook for Faster Product Validation

Asaph Kotzin

February 9, 2026

10 Best Practices to Build and Ship AI Features With Minimal Risk

Tanaaz Khan

January 29, 2026

Tracking Feature Flag Changes and Evaluation with Flagsmith and Sentry

Daniel Efe

November 28, 2025

We Built Our Own MCP Server for Engineers & Release Managers

Adrian Gregory

November 21, 2025

7 PostHog Alternatives for Feature Flag Management

Tanaaz Khan

November 12, 2025

Why LaunchDarkly Went Dark During the AWS Outage—And Why Flagsmith Didn’t

Matthew Elwell

November 7, 2025

Statsig Alternatives: 8 Best Feature Flag Platforms Compared

Tanaaz Khan

November 5, 2025

Integrating Datadog Workflows with Flagsmith for Automated Reliability

Daniel Efe

October 24, 2025

Progressive Delivery for Building LLM-Powered Features

Pete Hodgson

October 23, 2025

What is the Four Eyes Principle? A Developer's Guide to Safer Flag Changes

Tanaaz Khan

October 17, 2025

De-Risking AI Adoption: How Feature Flags Help Enterprises Move Fast Without Breaking Trust

Adrian Gregory

October 7, 2025

Monitoring Feature Flag Performance with Flagsmith, Prometheus, and Grafana

Daniel Efe

September 25, 2025

What is Release Management and How Does it Work in Regulated Industries?

Tanaaz Khan

September 17, 2025

Banking and Modern Observability: Dynatrace Insights

Andreas (Andi) Grabner

September 4, 2025

No More Hardening Phases: Testing in the Age of Continuous Deployment

Pete Hodgson

September 1, 2025

How Modernisation is Changing Open Source Banking

Rob Moffat

August 5, 2025

Use Grafana to Track Feature Health in Flagsmith

Mia Loiselle

August 28, 2025

6 Lessons From the World's Best Open-Source Founders

Ben Rometsch

August 27, 2025

Feature Toggles and Feature Flags: Understanding the Key Differences

Tanaaz Khan

August 25, 2025

8 Types of Deployment Strategies (And How Feature Flags Help)

Ben Rometsch

July 31, 2025

Moving to Progressive Delivery with Feature Flags

Ben Rometsch

July 11, 2025

Top 7 Feature Flag Tools for Enterprises in 2026

Tanaaz Khan

June 3, 2025

Moving Fast, Without Breaking Things: Modern Software Delivery with Feature Flags

Pete Hodgson

June 4, 2025

TypeScript Feature Flags: A Next.js Example

Michael Dinerstein

May 14, 2025

Embracing Modernisation in Banking Through Platform Engineering

Benjamin Brial

May 9, 2025

Transitioning to Modern Authorisation Management

Alex Olivier

April 22, 2025

What Are Feature Flags? Everything Engineering Teams Need to Know

Ben Rometsch

April 7, 2025

A Conversation with Komerční Banka's Chief Software Architect

Mia Loiselle

March 26, 2025

GitOps for Feature Flags Using Terraform and Terrateam

Malcolm Matalka

March 25, 2025

Why It’s Time to Test in Production: Best Practices

Tanaaz Khan

January 22, 2025

How We Improved Our Docker Image Security Using Chainguard's Wolfi

Kim Gustyr

January 7, 2025

6 Best Enterprise-Grade Harness Alternatives & Competitors

Tanaaz Khan

October 28, 2024

How to Roll out Pricing Changes With Zero Customer Complaints

Matthew Elwell

September 16, 2024

How to Use Feature Flags for Trunk-Based Development

Kyle Johnson

August 21, 2024

7 Best LaunchDarkly Alternatives & Competitors

Tanaaz Khan

August 12, 2024

How Global Banks Use Feature Flags to Stay Competitive

Tanaaz Khan

July 24, 2024

How To Guide: Flagsmith Grafana Integration

Pradumna Saraf

July 23, 2024

New in Flagsmith: 2024 Feature Roundup

Matthew Elwell

July 23, 2024

Don’t Let a Flawed Release Take Your Company Down

Ben Rometsch

June 26, 2024

How to Guide: Flagsmith GitHub Integration

Pradumna Saraf

May 28, 2024

6 Best Firebase Remote Config Alternatives & Competitors

Tanaaz Khan

May 16, 2024

How to Transition to Modern Feature Management in Banking

Ben Rometsch

March 21, 2024

5 Feature Flag Management Pitfalls To Avoid To Keep Your Flags in Check

Tanaaz Khan

February 29, 2024

The Best Thing about Founding a Remote-First Company? Pickled Onion Monster Munch and The Beautiful Game

Ben Rometsch

February 28, 2024

Flagsmith Jira Integration Guide: A Comprehensive How-to Guide

Abhishek Agarwal

February 16, 2024

Guide: How to Create Observability-Driven Development with Feature Flags

Savan Kharod

January 31, 2024

Build vs. Buy for Feature Flags: My Experience as a CTO with a 20+ Engineer Team

Daniel Engelke

January 16, 2024

Announcing the Flagsmith Referral Programme

Anna Redbond

January 15, 2024

How We Measure Feature Flags’ Success

Kyle Johnson

December 20, 2023

Customer Story: Serenis

Anna Redbond

December 7, 2023

Announcing the Flagsmith Jira Integration

Anna Redbond

June 6, 2024

Spring Boot Feature Flags: A Step-by-Step Implementation Guide with a Working Java Spring Boot Application

Abhishek Agarwal

November 22, 2023

Employees on Bootstrapping

Anna Redbond

November 14, 2023

Our POV: When Bootstrapping Works (and When It Doesn't)

Anna Redbond

October 25, 2023

How to Onboard Feature Flag Management Tools

Anna Redbond

October 12, 2023

When is it time to move to feature flag software?

Olga Diaz

September 26, 2023

Why We Bootstrap

Ben Rometsch

September 6, 2023

The Enshittification of Basically all Digital Design. But in this Case, Specifically, the Slack Redesign.

Ben Rometsch

January 9, 2025

Ruby Feature Flags: A Step-by-Step Guide to Implementing Feature Flags in a Ruby on Rails Application

Zeeshan Afridi

September 1, 2023

Unlocking Efficiency: Transitioning to Modern CI Processes

Geshan Manandhar

August 29, 2023

Customer Story: Vontobel

Anna Redbond

August 17, 2023

It's Time to Move to Modern Observability Tools and Progressive Delivery: Insights from Dynatrace

Andreas (Andi) Grabner

August 2, 2023

Moving to Modern Software Development and Continuous Integration for Banks: Insights from Romano Roth (Zühlke)

Anna Redbond

August 1, 2023

Developer-Led Podcast: Bootstrapping a Commerical Open Source Company to $1M ARR

Anna Redbond

July 24, 2023

Open Source Startup Podcast: Why Feature Flagging Should be Open Source with Ben Rometsch

Anna Redbond

July 20, 2023

Get The Analytics You Need: A/B Testing with Feature Flags and Your Existing Stack

Kyle Johnson

July 18, 2023

Open-Source in Banking: Rob Moffat from FINOS Talks Barriers, Benefits, and Pushing the Battleship to Adoption

Anna Redbond

June 30, 2023

Customer Story: Rain (VP of Platform Engineering)

Anna Redbond

June 30, 2023

Customer Story: Rain (Tech Lead)

Anna Redbond

September 26, 2024

PHP Feature Flags: A Step-by-Step Guide in a Working Laravel Application

Geshan Manandhar

January 15, 2025

What is Canary Deployment? When and How To Use It

Geshan Manandhar

October 10, 2024

Node.js Feature Flags: a Step-by-Step Implementation Guide with an Express.js Example

Geshan Manandhar