TABLE OF CONTENTS

Industry/News Company Updates Best Practices and How To Languages & Technologies Product Customer Stories

The actual infrastructure costs of running a global Edge API (part 2)

Ben Rometsch

This is a follow up post to part 1, where we looked at how much it costs to run a feature flagging platform at scale.

Since we wrote part 1, a lot of changes have been afoot at Flagsmith Infrastructure HQ! Most notably, we launched our Global Edge API into production, and have been migrating existing customers, and onboarding new customers directly onto the Edge platform.

We carried out all this Edge API work to solve some big topics around hosting, infrastructure and scaling:

We wanted to provide global low latency for all our customers wanting to serve flags to their applications.
We didn’t want to worry about scaling our SDK endpoints, ever.
We wanted global failover in the event of an entire AWS region outage
We wanted to take control of our costs, and tie them directly to our API traffic

Our Edge API - a technical overview

The Flagsmith API is split into two logical groups of tasks:

Serving flags to our customer’s applications.
These are the most critical and represent about 99.99% of our traffic.
Serving requests from our dashboard.
Actions like creating flags, managing segments, adding flagsmith users etc. Most of this traffic comes from our React frontend but some comes from our REST API as well.

Both of these API taskshave quite different requirements when it comes to uptime, global responsiveness and complexity:

Serving flags:
This requires as many 9s of uptime as we can manage, it requires global low-latency, and is quite simple in terms of data and transactions.
Serving dashboard requests:
Can realistically manage more downtime and doesn’t have hardcore latency requirements.

After a lot of R&D, we settled on the following platform to power our Edge API:

AWS Global Accelerator for latency-based routing and regional failover
Lambda for our Edge compute
DynamoDB global tables for our Edge datastore

This meant a fundamental change to how we pay for serving our requests.

Our prior infrastructure was based around a chonky RDS instance in the AWS London region, and elastic scaling of our ECS instances based on CPU load. This meant we had a big upfront fixed cost (The RDS reserved instance) and then smaller variable costs for our ECS cluster.

With the introduction of our Edge API, we were effectively going to a fully serverless architecture, both on the compute side and the data side.

Lambda Compute. What we pay!

We started off working with Lambda@Edge, but there were a number of caveats that eventually meant we gave up on using it. We decided instead to work with Lambda directly, and use Github Actions, Serverless framework and Pulumi to roll out updates to 8 regions (why 8? It’s DynamoDb related as discussed below!).

You really have two dials you can twiddle to optimise your Lambda costs:

Memory size
CPU architecture

We started out using 2GB instances on the amd64 architecture. Sizing lambdas is interesting - the more memory you define, the more compute you get, but the more you pay. Because we cared a lot about the latency and performance of these lambdas, we went live with 2GB. Once we had a decent level of traffic and data, we dialed the memory size down bit by bit to see how much it affected performance. We settled on 1GB as the right balance between oversizing the lambdas, and making them too small that they impacted the overall latency of the response.

We started off running our lambdas with amd64/Intel runners, but tested arm/Graviton instances, liked what we saw, and moved over to them once we were happy with the tests we carried out. This reduced our costs by about 30% with no loss of performance!

We are currently paying ~$3USD per million requests served for our Lambda Compute. Because we care a lot about latency, we do have provisioned concurrency enabled for these lambdas to reduce cold-starts. We do expect this number to come down as we scale, but not by much.

DynamoDB Global Tables. What we pay!

Because we need an Edge solution to achieve low latency, we decided we needed to replicate our data around the world. This avoided a bunch of complexity around caching, cache invalidation and all those hard problems. We evaluated a bunch of options and settled on DynamoDB global tables.

One oddity of this solution is that global tables are only available in 11 of the AWS regions. When we launched, there were only 8 regions available. We decided to deploy our data and compute in all 8 of these regions and see what the costs came out at with our production traffic workload.

How much does it cost? Pretty much exactly double our Lambda Compute costs. So ~$6 USD per million requests served. You can see how this splits out in the image below. Because our platform is very very read heavy, the bulk of the cost is in reads and replication.

There’s one thing that we plan on implementing in the near future, and that is DAX - a transparent caching layer that will hopefully bring these read costs down as we scale.

Global Accelerator. What we pay!

We get a lot out of Global Accelerator - it’s a great product and we really love it - we get latency based routing and global failover without having to ever worry about it! What does that cost? About 20 cents per million requests!

What about serving all that data?

Data transfer. That’s where AWS always get people right? For our workload, its pretty reasonable. Generally our responses are fairly small, and we do all the good stuff like gzipping and whatnot. What does data cost for us? About $1.50 per million requests.

So, are we happy with that?

Generally, yes! For a small team like ours, never having to worry about scaling or failover ever again has an enormous amount of value. Yes, we could probably power this SDK more cheaply using things like ECS and RDS, but we will invariably hit scaling limits with our database meaning upgrades, downtime and a bunch of other hairy problems. Moving to essentially a serverless database does cost us more per month. But we’re happy with what that gets us!

About the author

Flagsmith co-founder. Besides Flagsmith, Ben has founded several other companies, and he currently serves on the Governance Board of OpenFeature, a CNCF Sandbox Project. He's an advocate for open standards and open source and also hosts “The Craft of Open Source" podcast, where he interviews creators and maintainers from the open-source community.

June 4, 2025

TypeScript Feature Flags: A Next.js Example

Michael Dinerstein

May 14, 2025

Embracing Modernisation in Banking Through Platform Engineering

Benjamin Brial

May 9, 2025

Transitioning to Modern Authorisation Management

Alex Olivier

April 22, 2025

What Are Feature Flags? Everything Engineering Teams Need to Know

Ben Rometsch

April 7, 2025

A Conversation with Komerční Banka's Chief Software Architect

Mia Loiselle

March 26, 2025

GitOps for Feature Flags Using Terraform and Terrateam

Malcolm Matalka

March 25, 2025

Why It’s Time to Test in Production (+ How to Do It Safely)

Tanaaz Khan

January 22, 2025

How We Improved Our Docker Image Security Using Chainguard's Wolfi

Kim Gustyr

January 7, 2025

6 Best Enterprise-Grade Split Alternatives & Competitors

Tanaaz Khan

October 28, 2024

How to Roll out Pricing Changes With Zero Customer Complaints

Matthew Elwell

September 16, 2024

How to Use Feature Flags for Trunk-Based Development

Kyle Johnson

August 21, 2024

7 Best LaunchDarkly Alternatives & Competitors

Tanaaz Khan

August 12, 2024

How Global Banks Use Feature Flags to Stay Competitive

Tanaaz Khan

July 24, 2024

How To Guide: Flagsmith Grafana Integration

Pradumna Saraf

July 23, 2024

New in Flagsmith: 2024 Feature Roundup

Matthew Elwell

July 23, 2024

Don’t Let a Flawed Release Take Your Company Down

Ben Rometsch

June 26, 2024

How to Guide: Flagsmith GitHub Integration

Pradumna Saraf

May 28, 2024

6 Best Firebase Remote Config Alternatives & Competitors

Tanaaz Khan

May 16, 2024

How to Transition to Modern Feature Management in Banking

Ben Rometsch

March 21, 2024

5 Feature Flag Management Pitfalls To Avoid To Keep Your Flags in Check

Tanaaz Khan

February 29, 2024

The Best Thing about Founding a Remote-First Company? Pickled Onion Monster Munch and The Beautiful Game

Ben Rometsch

February 28, 2024

Flagsmith Jira Integration Guide: A Comprehensive How-to Guide

Abhishek Agarwal

February 16, 2024

Guide: How to Create Observability-Driven Development with Feature Flags

Savan Kharod

January 31, 2024

Build vs. Buy for Feature Flags: My Experience as a CTO with a 20+ Engineer Team

Daniel Engelke

January 16, 2024

Announcing the Flagsmith Referral Programme

Anna Redbond

January 15, 2024

How We Measure Feature Flags’ Success

Kyle Johnson

December 20, 2023

Customer Story: Serenis

Anna Redbond

December 7, 2023

Announcing the Flagsmith Jira Integration

Anna Redbond

June 6, 2024

Spring Boot Feature Flags: A Step-by-Step Implementation Guide with a Working Java Spring Boot Application

Abhishek Agarwal

November 22, 2023

Employees on Bootstrapping

Anna Redbond

November 14, 2023

Our POV: When Bootstrapping Works (and When It Doesn't)

Anna Redbond

October 25, 2023

How to Onboard Feature Flag Management Tools

Anna Redbond

October 12, 2023

When is it time to move to feature flag software?

Olga Diaz

September 26, 2023

Why We Bootstrap

Ben Rometsch

September 6, 2023

The Enshittification of Basically all Digital Design. But in this Case, Specifically, the Slack Redesign.

Ben Rometsch

January 9, 2025

Ruby Feature Flags: A Step-by-Step Guide to Implementing Feature Flags in a Ruby on Rails Application

Zeeshan Afridi

September 1, 2023

Unlocking Efficiency: Transitioning to Modern CI Processes

Geshan Manandhar

August 29, 2023

Customer Story: Vontobel

Anna Redbond

August 17, 2023

It's Time to Move to Modern Observability Tools and Progressive Delivery: Insights from Dynatrace

Andreas (Andi) Grabner

August 2, 2023

Moving to Modern Software Development and Continuous Integration for Banks: Insights from Romano Roth (Zühlke)

Anna Redbond

August 1, 2023

Developer-Led Podcast: Bootstrapping a Commerical Open Source Company to $1M ARR

Anna Redbond

July 24, 2023

Open Source Startup Podcast: Why Feature Flagging Should be Open Source with Ben Rometsch

Anna Redbond

July 20, 2023

Get The Analytics You Need: A/B Testing with Feature Flags and Your Existing Stack

Kyle Johnson

July 18, 2023

Open-Source in Banking: Rob Moffat from FINOS Talks Barriers, Benefits, and Pushing the Battleship to Adoption

Anna Redbond

June 30, 2023

Customer Story: Rain (VP of Platform Engineering)

Anna Redbond

June 30, 2023

Customer Story: Rain (Tech Lead)

Anna Redbond

September 26, 2024

PHP Feature Flags: A Step-by-Step Guide in a Working Laravel Application

Geshan Manandhar

January 15, 2025

What is Canary Deployment? When and How To Use It

Geshan Manandhar

October 10, 2024

Node.js Feature Flags: a Step-by-Step Implementation Guide with an Express.js Example

Geshan Manandhar

June 3, 2021

Integrate Heap with Flagsmith

Ben Rometsch

April 30, 2021

Security Benefits of Self-Hosting Feature Flags On-Prem | Flagsmith

Geshan Manandhar

April 15, 2021

Best Practices to Achieve Automated Testing & Zero Downtime Deployments

Ben Rometsch

April 1, 2021

Deployment is not a release; a step-by-step guide with feature flags

Geshan Manandhar

November 25, 2024

Feature Flags vs Remote Configuration: What’s the Difference?

Ben Rometsch

December 14, 2020

Get the most out of your Feature Flags with these best practices

Ben Rometsch

December 1, 2020

Customer Story: Palo Alto Software

Ben Rometsch

March 14, 2020

What I’ve learned creating a React Native performance monitor

Kyle Johnson

September 20, 2024

How to Setup Feature Flags in Android using Kotlin

Shubham Aggarwal

June 8, 2023

Customer Story: Smartex

Anna Redbond

May 26, 2023

Our First Remote Company Off-Site: What Worked, What Didn’t, and What We’ll Do Differently Next Time

Anna Redbond

May 19, 2023

Customer Story: Wistia

Anna Redbond

April 28, 2023

A Decision Continuum: Deciding Between Feature Flagging Software vs. an In-House Solution

Anna Redbond

May 8, 2023

Customer Story: Rabbit Care

Anna Redbond

April 18, 2023

Customer Story: alt.bank

Anna Redbond

May 3, 2023

Integrating your Flagsmith Project with Datadog: A Step-By-Step Guide with Real-Time Metrics

Abhishek Agarwal

May 10, 2024

Python Feature Flags & Toggles: A Step-by-Step Setup Guide in a Flask Application

Matthew Elwell

May 2, 2024

Java Feature Flags & Toggles: A Step-by-Step Guide with a Working Java Application

Abhishek Agarwal

November 16, 2022

Adventures in Terraform: How and why we built our Terraform Provider

Gagan Trivedi

April 8, 2025

Angular Feature Flags: a Step-by-Step Guide with a Working Application

Geshan Manandhar

January 30, 2025

Golang Feature Flags: A Step-by-Step Implementation Guide with a Working application

Abhishek Agarwal

June 29, 2022

Elixir feature flags: a step-by-step guide with an Elixir example

Ben Rometsch

June 6, 2022

How Banks Implement Feature Flags - Interview with KB Bank | Flagsmith

Ben Rometsch

June 16, 2022

.NET feature flag: a step-by-step guide with Xamarin example

Ben Rometsch

June 14, 2022

Our scariest release to date!

Ben Rometsch

June 15, 2022

The actual infrastructure costs of running SaaS at scale (billions of requests/month)

Ben Rometsch

January 2, 2022

How To Use Swift Feature Flags: iOS App with code examples

Ben Rometsch

May 11, 2022

Our CI/CD and release management process at Flagsmith

Ben Rometsch

January 21, 2022

How eFuse Uses Flagsmith for A/B & Multivariate Testing

Ben Rometsch

May 19, 2022

Flagsmith Submits OpenFeature as CNCF Sandbox Project | Flagsmith

Ben Rometsch

November 17, 2021

Using Flutter Feature Flags to Release Features Without Risk | Flagsmith

Ben Rometsch

May 24, 2024

How to Use JavaScript Feature Flags & Toggles to Deploy Safely [React.js Example]

Ben Rometsch

December 31, 2021

6 Metrics to Monitor When Rolling Out a New Feature Flag

Cassandra Polzin

September 29, 2021

How Inflow Improves Conversions Through A/B Testing with Flagsmith and Mixpanel

Ben Rometsch

October 7, 2021

5 learnings going from open source project to commercial open source business

Ben Rometsch

April 25, 2024

Feature Flags Best Practices: The Complete Guide

Geshan Manandhar

September 23, 2021

Decoupling Deployment from Release with Feature Flags

Cassandra Polzin

July 8, 2021

Use feature flags to release code safely in any git branching strategy

Geshan Manandhar

July 2, 2021

Feature Flag Analytics for users of Flagsmith and Amplitude

Ben Rometsch

August 20, 2021

How to Enhance Phased Rollouts with Feature Flags

Cassandra Polzin

October 1, 2024

React Native Remote Config: A Step-by-Step Implementation Guide

Geshan Manandhar

June 29, 2021

Decouple deployment from release to achieve continuous delivery with Feature Flags

Cassandra Polzin

June 23, 2021

Integrate New Relic with Flagsmith

Cassandra Polzin

June 21, 2021

Flagsmith & AppDynamics Enable Advanced Performance Analysis

Cassandra Polzin

May 5, 2021

Introducing Multivariate Feature Flags to enable seamless AB Testing and Canary Deployments

Ben Rometsch

June 11, 2021

Monolith vs. Microservice architecture: Embracing the Monolith safely with feature flags

Ben Rometsch

December 8, 2020

Flagsmith Release! v2.4.0

Ben Rometsch

February 1, 2020

Self Hosting all the things

Ben Rometsch

December 29, 2021

Is it time to delete your staging environment?

Ben Rometsch

January 11, 2021

My Mac Setup - 2020/21: Getting close to OS nirvana

Ben Rometsch

April 8, 2021

New Dynamic Flags combine the benefits of Feature Flags and Remote Config

Ben Rometsch