TABLE OF CONTENTS

Text Link

Industry/News Company Updates Best Practices and How To Languages & Technologies Product Customer Stories

The actual infrastructure costs of running SaaS at scale (billions of requests/month)

Ben Rometsch

This is a two-part series. Find the second article here.

In this article we are going to share three main things:

The evolution of our infrastructure over the last few years
The actual costs of our SaaS offering at Flagsmith
Our opinion on the role of feature flag services

Before we dive in, one important call-out: We provide our feature management product to customers in three ways depending on how they want to have it managed: Fully Managed SaaS API, Fully Managed Private Cloud SaaS API and Self-Hosted. The infrastructure costs that we are sharing is for our customers that leverage our Fully Managed SaaS API offering (try it free: https://www.flagsmith.com/) which represents a portion of the total flags being delivered among those three product offerings.

The Evolution (and costs) of our SaaS Infrastructure

Late last year, the Flagsmith SaaS API crossed the threshold of a billion flags served in one month.

To help others learn from our mistakes and successes, we’re sharing how our infrastructure changed over this period, the money we spent, the choices we made, and why we made them.

Infra Round 1: 2 hours of effort

Flagsmith started as a side project of Solid State Group - the agency I started 21(!) years ago. We had been building side projects for a while and had perfected the art of building as much as possible as quickly as possible. That meant leaning on frameworks like Django and Django Rest Framework to do as much heavy lifting as possible. But it also meant that our infrastructure was optimised for 2 things: 1) cost and 2) minimal effort to stand up.

When you have zero customers, lean into that.

We already had a Hetzner server kicking around that wasn’t doing much, so we installed dokku; a great project that gives you a self-hosted Heroku-like deployment platform but with minimal costs. We set up a super simple CICD via GitLab CI that would push changes to our `main` branch to dokku as we went. Postgres was up and running in 2 minutes. Sorting the DNS and LetsEncrypt SSL certificate was pretty easy. And that was it. Total time spent was maybe 2 hours.

We had no redundancy, very limited backups, no elastic scaling and a single box that was a big single point of failure. Like I said, we also had zero customers, so who cares?

Oh, actually now we have actual customers LOL

Pretty soon after releasing the platform and trying to drum up some traffic, we noticed a huge increase in traffic to our API. Back then we were hacking Google Analytics to track our API volume - it was free which was basically 90% of the decision making process. After some further investigation we realised a fairly large UK lottery operator was using the platform on some highly trafficked pages. This was great: people are using our platform! but also terrifying: people are using our platform!

We were super busy building out basic features at that point in time; we really didn't want to invest any further effort into our infrastructure. People don’t buy your infrastructure, they buy your product. But we had the problem; what we refer to as “the B2B2C problem”.

It goes something like this. If we sign up a customer, and they put our JS SDK on their website, we aren’t serving that 1 customer; we are serving all of their customers. So 1 new person using our platform could mean suddenly serving 100 million additional flags a month. If you are direct B2C, your scaling looks way less crazy, as additional users come in one at a time. But if you are B2B2C, you might suddenly have to triple your requests served literally from one second to the next.

And this is actually harder when you are small! When you are large, one large customer might mean an additional 10% load on your API, but when you are small, one large customer might mean 10x your load! It’s the power of small numbers. Or something.

Anyway, once we saw this happening in real life we decided it was time to put our professional trousers on. Hetzner and dokku had been great, but it was making us super nervous.

Round 1 Wrap Up

Total cost: ~$40 per month
Peak Flags served per month: 20 million
Anxiety Levels: 5/10 anxious but this all still feels like a side project

Infra Round 2: 2 days of effort

We know how to do this; we do it for our clients all the time! But in this case no-one was paying for it, so we wanted it to be clinically quick. We already had a Heroku-like deployment process. Elastic Beanstalk was pretty close to that, but with things like auto-scaling and managed databases; professional trousers stuff.

So we went with AWS Beanstalk, EU London region (it was in the EU at the time #facepalm). It’s sort of in the middle of the world, and we care about latency a little. RDS Postgres works. It does automated backups and security updates.

Beanstalk has loads of horrible sharp edges it turns out, but we plowed on and had things up and running in a couple of days. We stood up an Aurora Postgres database, and copied the data across. We had around 30 seconds of downtime where we finished migrating any new data, then pointed our Dokku instance over to AWS and updated our DNS records. Yes, we could have spent another month engineering a zero downtime migration, but these are the hard choices you have to make.

Elastic Beanstalk is actually great, but a bit terrifying

We were actually surprised how far this setup got us. We only recently migrated off it, props to Beanstalk - it might be the most unloved AWS product there is, but it works and it’s stable.

It does have a propensity, when you are doing something unusual like you know, deploying your software, where it sort of gives up and then sits in a weird state for half an hour whilst you are crying, but it almost always righted itself and redeployments then worked. We never really got to the bottom of why this happened, but in almost all cases it didn’t drop any requests, which was rapidly becoming our most important consideration.

Beanstalk also makes it super easy to scale out your application server layer. This worked perfectly, and was the main reason why we stayed on it for so long. At this point, the bottleneck of the platform started moving from the application server to the database and the number of connections it was having to service. At peak we got up to around 70 small app server instances. By this point our concurrent database connections were becoming our bottleneck.

Round 2 Wrap Up

Total Cost: Elastic costing eh! Anything from $300 to $800 per month - Cloudfront is expensive and there’s nothing you can do about it
Peak Flags Served per month: 600 million
Anxiety Levels: Still high as hell during deployments but getting a text in the middle of the night is less stressful

Infra Round 3: Containerisation Firecrackers

Over time, Docker and the containerisation of everything meant that our production SaaS infrastructure was looking less and less like that of our Enterprise customers as time went by. We also realised that Beanstalk was basically on autopilot, and there were some python dependencies we really needed that weren’t available on the platform.

So we made the decision to move to AWS ECS and Fargate. We were already building Docker images for our open source and Enterprise users, so it shouldn't be too tricky, right? By this time our application had grown to dozens of environment variables and the Beanstalk deployment had grown some hairballs here and there to kludge some stuff like SSL rewrites.

Luckily we didn’t have to move the data, which is almost always the hard part. So we stood up a new ECS cluster and used the excellent support AWS load balancers provide for splitting traffic between our two different sets of infrastructure. Over the course of a couple of weeks we migrated 100% of the traffic over to ECS and the Beanstalk infra dropped away like a well used rocket first stage.

What do we think of ECS?

ECS is superb and we love it a lot.

Round 3 Wrap Up

Total Cost: Actually dropping on a $-per-request basis. We upgraded to python 3.10 and more modern gunicorn, and the Docker images are more performant than the Beanstalk infra. We are now spending around $1,200 per month which splits roughly evenly between RDS, ECS and Cloudfront/bandwidth.
Peak Flags Served per month: Well into the billions
Anxiety Levels: Actually pretty low. ECS is rock solid and has blue/green deployments built in. We’ve yet to get it into the fugue state that we often managed to with Beanstalk.

Infra Round 4: What’s next

It’s coming up! We’re about to migrate our SDK traffic to a globally distributed and replicated Edge API. We’ve been working on it way too long but we’re going to deploy it very soon, and will be writing that up next!

If you are curious about our performance, please check out our status page. If you want to support us, please give us a star on Github! If you need feature flags and want to partner with a team that is in it for the reasons I outlined above; feel free to sign up for a free account and give it a try!

Lets go!

This is a two-part series. Read the second article about the infrastructure costs of running a global Edge API here.

About the author

Flagsmith co-founder. Besides Flagsmith, Ben has founded several other companies, and he currently serves on the Governance Board of OpenFeature, a CNCF Sandbox Project. He's an advocate for open standards and open source and also hosts “The Craft of Open Source" podcast, where he interviews creators and maintainers from the open-source community.

July 28, 2026

How to Choose the Best Feature Flags for Your AI Company

William Sigsworth

July 27, 2026

The Best A/B Testing Tools for Websites and Product Teams in 2026

Matt Althauser

July 24, 2026

Dogfood Testing: Why We're Running Experimentation on Our Signup Page

Wadii Zaim

July 22, 2026

Feature Toggle Management: A Practical Guide for Engineering Teams

Matt Althauser

July 21, 2026

The 7 Key Phases of the Software Development Lifecycle

William Sigsworth

July 16, 2026

How to Build a Software Rollback Strategy for Your Deployments

William Sigsworth

July 14, 2026

Server Side Testing: What It Is and How to Do It Right With Feature Flags

William Sigsworth

July 13, 2026

Alpha vs. Beta Testing: What’s the Difference and When Should You Use Each?

William Sigsworth

July 9, 2026

What Is Continuous Testing: The Ultimate Guide for Dev Teams

William Sigsworth

July 7, 2026

Regression Testing: Your Safety Net Before Code Reaches Users

William Sigsworth

July 6, 2026

What Is a Software Release? The Ultimate Guide

William Sigsworth

July 1, 2026

DORA Metrics Explained: The Five Measures of Software Delivery Performance

William Sigsworth

June 30, 2026

Explaining The Ring Deployment Model: Safer Releases, Ring by Ring

William Sigsworth

June 24, 2026

Feature Flags in DevOps: What They Are, Why You Need Them

Asaph Kotzin

June 22, 2026

What Is a Dark Launch? The Ultimate Software Development Guide

William Sigsworth

June 15, 2026

What Is Product Lifecycle Management?

William Sigsworth

June 9, 2026

What GitLab Feature Flags Can Do for Your Release Workflow

William Sigsworth

June 3, 2026

The Engineering Team's Guide to Release Strategies That Actually Work

William Sigsworth

June 1, 2026

You Can Now Integrate Flagsmith with GitLab! Here's How You Do It

Asaph Kotzin

May 27, 2026

The Benefits of A/B Testing, and Why Feature Flags Make It Even Better

William Sigsworth

May 20, 2026

The Developer's Playbook for Beta Testing That Actually Works

William Sigsworth

May 20, 2026

Code References: See Exactly Where Your Feature Flags Live in Your Codebase

Evandro Myller

May 18, 2026

What Is Blue-Green Deployment? The Complete Guide

William Sigsworth

May 12, 2026

Smoke Testing Explained: Catch Build Failures Before They Reach Your Users

William Sigsworth

May 7, 2026

When Canary Alerts Go Wrong: How We Fixed It and Doubled Down on OSS

Kim Gustyr

May 6, 2026

Release Testing: A Complete Guide for Development Teams

William Sigsworth

May 5, 2026

What Is a Kill Switch in Software and Why Do Developers Need Them?

William Sigsworth

April 29, 2026

How to Implement CI/CD: A Practical Implementation Guide

William Sigsworth

April 27, 2026

What Is CI/CD? A Plain-English Guide to Faster, Safer Software Delivery

William Sigsworth

April 21, 2026

Rolling Deployment Vs. Blue-Green: Which Strategy Fits Your Pipeline?

William Sigsworth

April 20, 2026

What Is Feature Management and Why Does It Matter?

William Sigsworth

April 15, 2026

What Is Trunk-Based Development? A Complete Guide

William Sigsworth

April 13, 2026

Deployment Frequency: The Metric That Reveals How Fast Your Team Really Ships

William Sigsworth

April 9, 2026

OpenTelemetry, without the vendor lock-in: Introducing full observability for Open Source and Self-Hosted Flagsmith customers

Kim Gustyr

April 7, 2026

How to Migrate from LaunchDarkly to OpenFeature in 6 Steps

Tanaaz Khan

March 31, 2026

How Prometheus, Flagsmith, and Some Good Old-Fashioned Compression Helped Us Solve Customer Pain

Matt Althauser

March 30, 2026

Feature Flag Testing: How Enterprise Teams Build Real Product Learning Loops

Asaph Kotzin

March 26, 2026

Trunk-Based Development vs. Gitflow: Choosing the Right Branching Strategy

Mia Loiselle

March 25, 2026

Why OpenAI Paid $1.1 Billion for a Feature Flag Company

Matthew Elwell

March 20, 2026

The Engineering Leader's Guide to Scaling Feature Flags

Tanaaz Khan

March 19, 2026

6 Tips to Reduce and Manage Technical Debt in 2026

Tanaaz Khan

February 24, 2026

Three teams. Eight hours. Three amazing features: Flagsmith’s 2026 Lisbon Offsite and Hackathon

Adrian Gregory

February 17, 2026

Vibe Coding and Feature Flags: The New PM Playbook for Faster Product Validation

Asaph Kotzin

February 9, 2026

10 Best Practices to Build and Ship AI Features With Minimal Risk

Tanaaz Khan

January 29, 2026

Tracking Feature Flag Changes and Evaluation with Flagsmith and Sentry

Daniel Efe

November 28, 2025

We Built Our Own MCP Server for Engineers & Release Managers

Adrian Gregory

November 21, 2025

7 PostHog Alternatives for Feature Flag Management

Tanaaz Khan

November 12, 2025

Why LaunchDarkly Went Dark During the AWS Outage—And Why Flagsmith Didn’t

Matthew Elwell

November 7, 2025

Statsig Alternatives: 8 Best Feature Flag Platforms Compared

Tanaaz Khan

November 5, 2025

Integrating Datadog Workflows with Flagsmith for Automated Reliability

Daniel Efe

October 24, 2025

Progressive Delivery for Building LLM-Powered Features

Pete Hodgson

October 23, 2025

What is the Four Eyes Principle? A Developer's Guide to Safer Flag Changes

Tanaaz Khan

October 17, 2025

De-Risking AI Adoption: How Feature Flags Help Enterprises Move Fast Without Breaking Trust

Adrian Gregory

October 7, 2025

Monitoring Feature Flag Performance with Flagsmith, Prometheus, and Grafana

Daniel Efe

September 25, 2025

What is Release Management and How Does it Work in Regulated Industries?

Tanaaz Khan

September 17, 2025

Banking and Modern Observability: Dynatrace Insights

Andreas (Andi) Grabner

September 4, 2025

No More Hardening Phases: Testing in the Age of Continuous Deployment

Pete Hodgson

September 1, 2025

How Modernisation is Changing Open Source Banking

Rob Moffat

August 5, 2025

Use Grafana to Track Feature Health in Flagsmith

Mia Loiselle

August 28, 2025

6 Lessons From the World's Best Open-Source Founders

Ben Rometsch

August 27, 2025

Feature Toggles and Feature Flags: Understanding the Key Differences

Tanaaz Khan

August 25, 2025

8 Types of Deployment Strategies (And How Feature Flags Help)

Ben Rometsch

July 31, 2025

Moving to Progressive Delivery with Feature Flags

Ben Rometsch

July 11, 2025

Top 7 Feature Flag Tools for Enterprises in 2026

Tanaaz Khan

June 3, 2025

Moving Fast, Without Breaking Things: Modern Software Delivery with Feature Flags

Pete Hodgson

June 4, 2025

TypeScript Feature Flags: A Next.js Example

Michael Dinerstein

May 14, 2025

Embracing Modernisation in Banking Through Platform Engineering

Benjamin Brial

May 9, 2025

Transitioning to Modern Authorisation Management

Alex Olivier

April 22, 2025

What Are Feature Flags? Everything Engineering Teams Need to Know

Ben Rometsch

April 7, 2025

A Conversation with Komerční Banka's Chief Software Architect

Mia Loiselle

March 26, 2025

GitOps for Feature Flags Using Terraform and Terrateam

Malcolm Matalka

March 25, 2025

Why It’s Time to Test in Production: Best Practices

Tanaaz Khan

January 22, 2025

How We Improved Our Docker Image Security Using Chainguard's Wolfi

Kim Gustyr

January 7, 2025

6 Best Enterprise-Grade Harness Alternatives & Competitors

Tanaaz Khan

October 28, 2024

How to Roll out Pricing Changes With Zero Customer Complaints

Matthew Elwell

September 16, 2024

How to Use Feature Flags for Trunk-Based Development

Kyle Johnson

August 21, 2024

7 Best LaunchDarkly Alternatives & Competitors

Tanaaz Khan

August 12, 2024

How Global Banks Use Feature Flags to Stay Competitive

Tanaaz Khan

July 24, 2024

How To Guide: Flagsmith Grafana Integration

Pradumna Saraf

July 23, 2024

New in Flagsmith: 2024 Feature Roundup

Matthew Elwell

July 23, 2024

Don’t Let a Flawed Release Take Your Company Down

Ben Rometsch

June 26, 2024

How to Guide: Flagsmith GitHub Integration

Pradumna Saraf

May 28, 2024

6 Best Firebase Remote Config Alternatives & Competitors

Tanaaz Khan

May 16, 2024

How to Transition to Modern Feature Management in Banking

Ben Rometsch

March 21, 2024

5 Feature Flag Management Pitfalls To Avoid To Keep Your Flags in Check

Tanaaz Khan

February 29, 2024

The Best Thing about Founding a Remote-First Company? Pickled Onion Monster Munch and The Beautiful Game

Ben Rometsch

February 28, 2024

Flagsmith Jira Integration Guide: A Comprehensive How-to Guide

Abhishek Agarwal

February 16, 2024

Guide: How to Create Observability-Driven Development with Feature Flags

Savan Kharod

January 31, 2024

Build vs. Buy for Feature Flags: My Experience as a CTO with a 20+ Engineer Team

Daniel Engelke

January 16, 2024

Announcing the Flagsmith Referral Programme

Anna Redbond

January 15, 2024

How We Measure Feature Flags’ Success

Kyle Johnson

December 20, 2023

Customer Story: Serenis

Anna Redbond

December 7, 2023

Announcing the Flagsmith Jira Integration

Anna Redbond

June 6, 2024

Spring Boot Feature Flags: A Step-by-Step Implementation Guide with a Working Java Spring Boot Application

Abhishek Agarwal

November 22, 2023

Employees on Bootstrapping

Anna Redbond

November 14, 2023

Our POV: When Bootstrapping Works (and When It Doesn't)

Anna Redbond

October 25, 2023

How to Onboard Feature Flag Management Tools

Anna Redbond

October 12, 2023

When is it time to move to feature flag software?

Olga Diaz

September 26, 2023

Why We Bootstrap

Ben Rometsch

September 6, 2023

The Enshittification of Basically all Digital Design. But in this Case, Specifically, the Slack Redesign.

Ben Rometsch