Interview with Ari Zilka: CEO,
Ben Rometsch
June 4, 2024
Ben Rometsch - Flagsmith
Ben Rometch
Host Interview
Host Interview

Observability is cloudy. Cloudy is overloaded. It's such a complicated term. Most people don't seem to know the definition.

Ari Zilka
Ari Zilka

In this episode of The Craft of Open Source, Ari Zilka describes his transition from his previous projects to founding a new company that aims to improve control over observability data in IT. He identifies issues with current observability tools, such as high costs and lack of flexibility, and propose using OpenTelemetry to offer a more customizable and manageable system. This new approach gives DevOps teams better tools for handling large amounts of data, ensuring security, and optimizing costs without relying on proprietary solutions. Tune in to learn more about and Ari's outlook on the open-source space!


I have Ari Zilka with me. Welcome, Ari.

It’s nice to meet you, Ben. It’s great to speak to your audience.

Where are you based right now?

I am in the Bay Area, which means the San Francisco Bay Area for people who aren't narcissistic and think they know where we live.

I was doing some research on you and my side of the company that we were going to talk about. I couldn't help but notice you were the Director of Technology at Sapien from 1999 to 2001. I was a technology leader in a competitor called Rare Medium at the time. I remember those days when everything was hard from an engineering point of view. Have you got a single famous hilarious war story from trying to build websites 25 years ago?

I shouldn't tell it, but I'm going to tell it. They can't do anything about it anymore, but it's a large clothing site. Somehow, I got tied into retail, which is how I ended up at After my success at Walmart, getting them off the ground before I went in-house, I did one other retail project before joining Walmart. That other retailer was a large multinational clothier. They were powered by one of the operating system vendors stacks out of the Northwestern Pacific area. Everyone else I was working with was powered by some combination of Sun Microsystems or Red Hat. Red Hat was new at the time, but they either had Sun chips or Intel chips, a flavor of Unix.

I was wondering. These guys have 500 servers. They're serving 3,000 concurrent users. What is wrong with this stack? I studied it. The page render times were twelve seconds, and it took a whole CPU to do a whole box, not a whole CPU. We ended up calling Sun Microsystems. They said they'd buy the whole stack of software, hardware, and everything back for the customer and switch it all to Sun equipment.

The CEO of the clothier yelled at me like, “How dare you do this on my behalf?” I was like, “I'm trying to help.” That was a huge lesson learned for me. Your channel is a lot about open source, and that cemented my journey. I am not interested in jets, private flights, and golf outings. I’m deciding my stack over a bottle of wine.

I worked in a technology agency. I remember them having way too much money, no money, going bust, and how hard everything was. I talk about this a lot. Kids nowadays don't know you've been born being able to start 1,000 servers in ten minutes unless you've got the money for it. You've had an interesting career before you got to where you are and what you're working on at the moment. Do you want to provide background because it's interesting?

I fell in love with computers way back. I'm not the typical computer guy from the ‘80s. A lot of them were hacking on Texas Instruments and Apple. My family didn't understand computers that well. We were on Atari 800s. My favorite thing to do was “10, print hello, 20 go to 10,” and watch that flow down the screen. I didn't know much. My brother was writing video games on the Atari. He was a genius about it.

High school comes around, and I have access to AP computer science. I took it. Why not take yet another advanced placement class, as we call it in the US? That cemented my future. I learned Pascal. I could consume anything the teacher would throw at me instantaneously. I fell in love with computers. I went to UC Berkeley. I was in engineering. I had all the computing power I wanted. These Sun Microsystems machines and deck alphas were blowing my mind. I was like, “What is this? How do you move files around? This is crazy. This shared user environment.” The next thing you know, I found out that the other students outside engineering don't have access to computing.

I found a volunteer organization that had been running for a couple of years. I got voted in as the manager of the organization and ran email for the entire UC campus. It’s a twenty-person volunteer org. That led me to a group of people who were true nerds, brilliant genius-level people who were working at Motorola, building pager technology that went into IndyCar and NASCAR, which led me to super-fast data. I forget the terminology already. The 38,000 bog modems at the time came off of the racetrack. You had to send 50 or 43 cars telemetry in real-time to every pit crew off of the nose comb transponders. It was amazing.

That fast data led me to, where I built the core platform, which led me to spin out Terracotta and start a company that was all about big in-memory data, which helped cement and create the in-memory data grid market, us and a couple of other players. That led to meeting Peter Fenton, which led to meeting Rob Bearden, who pulled Horton Works and the whole Hadoop team out of Yahoo. We did that together.

After that, I had so much access to rarefied air hanging out with these top-tier VCs that I wanted to do that for myself. I realized that's not what I want to do after several years. I loved investing, but I had to be an operator. I had to be the person making the technical decisions or facilitating the technical decisions, not the advisor role.

That brought me back in-house to one of the big observability vendors, New Relic. It's a little-known fact that Terracotta incubated inside Wily Technology, which was the New Relic founder's first startup. When it came time to go back in, my friends advised me to find and have a role carved out for me. We created something called the incubator at New Relic, which built all their new tech. I did that for the last five years and left before they went private.

MyDecisive: There's a scientific or engineering definition of observability that doesn't seem to help us in day-to-day IT in the enterprise. The reason it doesn't help us is because there's a data problem.

It's an interesting career to start. Most of the people that I speak to on this show don't have big AAA VC firms in their employment history. It's interesting to see that you weren't standing on the sidelines. It wasn't something that you could stomach.

I love computing. I love one specific aspect of it, which is big data and fast data. It looks like a varied career then, but it's all focused on millions of transactions per second, terabytes of data per day, and petabytes per month at this point.

Ari’s Journey To The Present Project

That being the context for why we're talking, can you tell me a little bit about the origin story of the project you're working on at the moment?

Without doing a commercial for the company, the origin story is a great way to frame the question. I appreciate that because your audience is demanding insight, not just advertising. I had a great run at New Relic. There were amazing people, great leadership, great teams, and great products. What I liked was that I was allowed to hang out with the biggest customers. I don't know why they let me do that. I was talking to companies with hundreds of thousands of machines and thousands of services. You take my background in big data.

This is the origin of the company I'm working at now. I realized that this class of data is where the observability vendors work. Observability is such a complicated term. No disrespect to anyone, but most people don't seem to know the definition. There's a scientific or engineering definition of observability that doesn't seem to help us in day-to-day IT in the enterprise. The reason it doesn't help us is because there's a data problem. The data is very high volume. You're talking about billions of records a minute. It's the highest volume I've ever worked in.

Observability is cloudy. Cloudy is overloaded. It's such a complicated term. Most people don't seem to know the definition.

The observability vendors have gone out and provided you with a completely turnkey end solution that gets that massive amount of data into beautiful little Excel spreadsheet-style charts, but they missed the forest for the trees. What they missed is that in most classes of data, you have governance, control plane, tooling, and maturation before the enterprise goes whole cloth in on this stack.

Warehousing is all about business data, sales data, inventory, and customer data. Lakes are all about log data and semi-structured data where you know you can extract intelligence, but it's expensive in a structured world to do so. Telemetry data is this high-speed streaming data that doesn't fit into, for example, Kafka. It's too high a volume. It's too unknown in its value. You need to pick out the nuggets.

That data is where I'm coming in to give you a solution that brings the problem back on-prem, allows you control before it leaves your boundary, and lets you decide what's important to you, what's not important to you, what to keep, what to throw away. It's those hundreds of thousands of customer instances that I realized they have no control over. I want to put control back in the enterprise's hands, specifically DevOps and site reliability engineering teams. They can't take a black box solution that results in charts. There's so much more to do with this data, and there's so much challenge in managing this data.

When you say no control, can you expand on that a little bit? What exactly do you mean by that?

Let's not focus on my old employer. Let's talk about the space in general. Generally speaking, you drop in a language agent, as we call them. That language agent means Java, Go, C++, Python, or Node. That language agent or that runtime agent instruments most of your application. You can write some custom instrumentation, and it's off to the races. Most servers are services in the cloud. It emits all the telemetry it can record.

The observability vendors have gone out and provided you with a completely turnkey end solution that gets that massive amount of data into beautiful little Excel spreadsheet-style charts, but they missed the forest for the trees.

The things you can control are frequency and emit telemetry every ten seconds or every minute. That's all you can control. The rest goes out to an account somewhere in the cloud. You give it an API key, and you get your charts on the other side. On that side, you can write some filters and drop rules, depending on the vendor stack. You then write queries and get a bill at the end of the month.

Control is not just who you send it to but how many vendors you send it to. There was a use case where I had a company owned by a Chinese corporation above but run mostly out of the US. The parent company wanted to use the same monitoring, but they couldn't because they couldn't send data past the great firewall.

Two versions of the same stack are monitored by two different instances and technology vendors. That's a lack of control. That's one instance of a lack of control. I want to send my telemetry to places. I want two different vendors involved in monitoring. Another example of control is this application is low value to me. I don't need its data. I had one customer who said, “My DR environment is up and hot. It's emitting telemetry and costing me $10,000 a month.” I could filter out DR until it comes live. I have to write a stack of programs that figure out that ER is now in live production. It's no longer in production, and we have fixed the old problem. I should mute it or unmute it. How do I do that?

Security use cases. Can I scrub the contents for PII, such as credit card data, government identification data, or email addresses? How do X out all the email addresses in my production telemetry? This isn't a database. This is a monitoring system. The answer is that you have traces on it. Now you're tracing applications in vocations, including arguments, and you can't stop it.

Security advisories or serious ones come out of Google infrequently, but a while ago, it’s exactly that. They were logging passwords in the clear, which is what you're talking about. That's easy to do without meaning to do it.

I know because I did it. You’re exactly right. I built an entire monitoring system from scratch at, which is how I met the Wily and New Relic founder. He came to sell his wares. Our CTO called me in and said, “Show him what we've got.” He said, “Your system is superior to ours in every way except visuals. You can control what you're storing. You have powerful query capabilities. It's all SQL, not proprietary. How did you get this?”

The answer is I wrote it myself. Not to say that what I'm doing now comes from twenty years ago era of Walmart. I had logs flown by where the user successfully logged in, username X, and password Y. It's like, “Dev, don't do that.” Call all the devs and say, “Don't do these 2,000 things. Here's our rules. Don't log this. Only log that.”

I'll tell you another story. I had a big wireless carrier in the US ask me to consume 50 gigs of data and find in one day's worth of logs the anomalies in their calls to their IBM mainframe. The first thing I did was open the file in my favorite editor. It was full of hello from line 22. These are millions of lines. I did a count number of lines. There were millions of lines. I did a graph minus V and deleted those lines into a new copy of the file. The 50 gigs became 5.

Everyone knows how to write. Splunk has been around for years. It was sold to Cisco for a huge number of billions of dollars. Everyone knows you can easily drop those log entries, but that's a trivial example. You have to have control, and you have to have these tools built by developers for developers. You need to be able to inject your own code and logic. It's not always as trivial as hello from line 22. It's sophisticated.

That same carrier has now written off AI from vended solutions in the observability space and building its own bespoke AI. I spoke to them. How are they going to do it? I asked them, “How do you get a copy of the data from your vendor?” They're like, we don't know yet. I'm like, “I know that vendor. You can't get a copy of the data.”

In terms of observability, how do you define it?

Defining Observability

The definitions of observability come from control engineering theory. I want to bring a new definition to the market. The legacy definition was building a system that you can tell what it's doing. I'll speak in plain English. You can tell what a system is doing without having access to its internal source code. To me, the correct definition hinges on grammar. There's observable and observing or the observer. That definition of observability from control theory is the definition of observation. I'm looking at something that is a black box and making sense of what it does.

In software, we have language agents and run times. We get deep inside applications. It's not about the injecting of observable information. It is making sure that you understand what a system's purpose is. It's not what it calls its handling at the moment. It's handling a request from Ben. He wants to add a basketball to his shopping cart. That's something we can tell, and we've been able to tell for 30-plus years.

You can do that with logs. You don't need language agents, fanciness, and billions of records a minute. What we need to tell is this is an instance of checkout. It is calling a database. It's running in Kubernetes, Amazon, and the US West 2. The notion of turning observability is not about turning something observable. It's about forcing that thing to conform with your IT requirements such that you can see where it is, what it is, who put it there, and why it's doing what it's doing.

MyDecisive: The notion of turning observability is not about turning something observable. It's about forcing that thing to conform with your IT requirements.

Consuming 95% CPU is not observation. It's consuming 95% CPU because it's doing more transactions than normal. Observable is what I care about, not observability or observation. It's not about graphs, charts, or massive amounts of data offloaded to a cloud. It's about standards and control over entities so that they do what you tell them to do.

Why do you think this space, in particular, has struggled with open-source winner products?

There was a low root, which is now supplanted by the Grafana ecosystem and OpenTelemetry.

Do you think that's accurate?

No disrespect to the open-source community, but they've got their act together now around OpenTelemetry. This space wasn't considered sexy 25 years ago. Monitoring was something I threw together, and everyone I knew threw together. The rise of the CIS admin in the 1990s led to Nagios and Ganglia, which led to a bunch of automation systems and convergence systems like Puppet, Chef, and Ansible. The CIS admin programmer came around, which we eventually called DevOps. That's my personal view on the history.

Because the Nagios and Ganglia era of tooling and the founders of the observability space were true software developers, you had polar opposites. You had a dichromatic like, “I'm going to look at what the method calls are. I am Wily, New Relic, AppDynamics. I'm going to look at method calls and give you production stack traces.” You had CIS admin saying, “I need to know CPU, Network iOS, and the whole middle, the why.” What is it all stitched together, and what's its purpose?

It took twenty years for people to realize that IT professionals need to understand as humans what's going on. The rise of machine learning-based AI ops confused everybody. Everyone thought we were going to take CPU signals and figure out predictive outages. No one ever got there. No one has gotten to predicting outages with ML. I use ML carefully.

The theory I would propose to answer your question is you had devs building tools for devs. You had ops building tools for ops. You had this whole market manifest called observability that didn't solve a full steel-threaded end-to-end use case of root cause analysis and predictive outage detection. They're still trying to do it. Everyone got lost.

Out of those ashes, I call them ashes because a lot of companies have been burned by multiple tens of millions of dollars of costs in observability per year. Out of those spending ashes comes OpenTelemetry, which says, “Let's standardize all of this end to end. Let's make the plumbing configurable and controllable.” I love it because I bet my whole farm on OpenTelemetry. Make the plumbing configurable and controllable, which is dear to my heart. Let's make the agent's commodity. No more agent battles, no more trying to lock customers in by getting the dev to trial me. Once the dev trials me, I'm embedded. I can't get me out.

Let's solve the end-to-end value chain. Let's worry about what the observability vendors offer like storage, management, and governance as an afterthought. Let's instead focus on plumbing and instrumentation and make them free in commodity. The problem is that people served multiple masters, no clear use case, and no clear target audience, and they started to hold people hostage with proprietary solutions.

It's interesting on the one hand hearing you describe the point of history of it, which is fascinating. On the face of it, you would think that this is something that would be a perfect use case for source platforms. As far as I understand it from the origin story of OpenTelemetry, that didn't come out of the guy in Nebraska who did it. It came out as a huge, valuable, fully commercial organization, which is fascinating. The whole space is unusual and counterintuitive for someone who comes from all sorts of different segments and areas. Especially in the modern day, it's a natural state to be open from the beginning. It's interesting in that regard.

The origin story aligns with what you're observing here. No pun intended, but the vendors got in the way. It's the cloud. It made observability such an easy button. I grab this agent and drop it in for service. I've got dashboards, and I won't look at it again until I'm down. It's not like Linux or PostgreSQL. I call this thing every single day. I don't want to talk to a vented database company because Walmart paid $100 million a year for the dot-com, and that is in year one. That's for the database.

You're going to move off off-ended solutions to open source. Open source is going to be built by a person in Nebraska who's frustrated that such an addressable, clearly understood problem space is being held captive by someone. Here we have it again. The understood problem space is held captive by a person who's built a solution or by a set of companies who've built a solution. OTel came from these people. OTel comes from one of these big vendors creating a consortium around themselves. They did it to try to take out the other vendors.

The reason they did it is because once you put in an agent from vendor A, your devs tell you, “Go away and pound sand. I'm not going to put in an agent from another vendor. I love my stack and observability vendor. I've got other things to do for the business. I'm not going to retest my whole application.” These guys come in and say, “Let's dislodge all of our competitors by creating an open-source standard.” They created a whirlwind. They open Pandora's box because I have spoken to many companies.

This is also interesting, Ben. Open source is a grassroots type of thing, and big companies adopted it late, like Kubernetes, which was a multibillion-dollar company or a trillion-dollar company that was using VMware, and Kubernetes was becoming big. You have the opposite phenomenon where the big companies go first. You talk to small companies. They're like, “Observability is free for me. I'm five nodes. I don't have to pay. I'm going to use vended solution in the cloud easy button.”

The big vendors are getting bills like $10 million a month. They're open-source standards. I'd rather spend $20 million on a dev team supporting OpenTelemetry than spend $20 million on vendor A. The only problem is Open Telemetry is a spec and a standard. It's not a product per se. All these vendors have shown up with distributions. I don't want to be yet another OpenTelemetry distribution. I want to be the control plane. I want to be an enterprise-class solution. I'm not trying to compete with OTel. I'm trying to take OTel and leverage it to be total governance and control. That doesn't exist. A standard needs to exist. Not only am I based on open source, we'll be open source. Later, in 2024, we are giving it all away.

I hadn't considered it, but all those competitors now who are building on top of Open Telemetry are building another New Relic, Dynatrace, or AppDynamics. They don't have to worry about any of the super hard IP and engineering that's going into runtime, decompiling, bike code, and doing a load of crazy stuff. It's quite antithetical to open-source in general that they're taking those tools and reinventing the wheel in a way.

I was surprised the first time I went to CubeCon in Amsterdam. It felt like half of the commercial hall was made up of people who had built stuff on top of OpenTelemetry. I was with a couple of folks from a big observability company. It’s not that there's anything wrong with that. They're building legitimate businesses and things around it. It's interesting you mention it as a Pandora's box, but do you think it surprised people that OpenTelemetry has gone the way it has and blown everything, not blown everything apart, but it's completely revolutionized or reinvented the space?

It surprised everyone. I'll tell you why. It’s because they thought that the data was not the value. They thought that storage and visualization were the values. What does open source have a plethora of? Storage solutions, data management solutions. You have Prometheus, Presto for Query, Impala, the whole Hadoop ecosystem, other ecosystems, and cloud solutions. You could get storage solved, which we used to have armies of people. You and I used to have armies of people worrying about storage, IO, format, and schema. That's super flexible now.

I think back to Star Trek Next Generation, where Geordy, the engineer, was programming by moving chips around. That's what you can do now in the cloud. You can program by being like, “I'll take a little bit of Kubernetes, Aurora, and Prometheus. Thank you very much.” My backend, plumbing, and infrastructure are built for me. OpenTelemetry realized that and said, “I'm going to put something on-prem that takes the data and unshackle it from this proprietary value chain.”

Every single cloud observability vendor, all of them says, “Here's the solution.” It's not malice. They copied the first guys in. They said, “Take it from the agent and get it to the cloud as fast as you can. The customer doesn't want any infrastructure overhead or complexity.” The equation has flipped on its head. Everyone says, “That solution is expensive, and I have zero control.”

OTel is not about the agents. A lot of people misconstrue it as it's commoditizing the agents. All these vendors gave away their agents under Apache 2.0 license. You can do whatever you want. You can fork, copy, mutate, grow, and extend their agents. It's not that. They've started to give you a foundation on which someone like me can build a control plane for you. That control is the one thing that people covet because they don't have it.

All that being said, what is MyDecisive?

It's an open-source project. It is what we call an observability operating system. What we mean is analog to general-purpose computing. Think of cloud-vended observability. It's proprietary. It looks an awful lot like a purpose-built computer, a calculator, or a GPS unit. What we are doing is making a standard like Android itself where people can build apps, have consistent access to data, have a security plane, a governance plane, a resource access control plane, and a communication plane. We're building all of those stacked components up just like in a general-purpose operating system.

We are building an observability operating system, analogous to general-purpose computing, where people can build apps, have consistent access to data, and have a security plane, a governance plane, and a resource access control plane.

I'll define it best by example. One of our prospective customers, we're not shipping yet, but they said,
“I want to build my own anomaly detectors and root cause analysis. The AI from the observability vendors doesn't work.” The first thing I asked them was, “How are you scheduling that? How are you getting a copy of the data? How are you making sure that's a nonstop environment? How are you storing? How much data are you working on this? Are you working set one hour for anomalies, one day, one week, or one month? Where are you putting it?” They're like, “We are figuring all of that out right now.” I'm like, “That's going to take you two years to harden that.”

I have that as a drop-in. I'm not OTel. I'm built on OTel. Unlike all the vendors who say, “Let me get into the OTel fray and let me get into the observability fray.” We've thought beyond what the use cases are. They are things like security on-prem, anomaly detection on-prem, and muting a stream of data until an anomaly occurs so that I don't pay the vendor until I need the data.” That's impossible today. OTel doesn't help you do it, but I do.

From an engineer's point of view, how's that different? How's that going to make day-to-day jobs different from what they have?

First and foremost, I want to serve multiple target audiences. I want to give operations control that they didn't have before. With MyDecisive, they're going to be able to tell their developers, “Do what you want. Use whatever vendor you want.” I will make sure that your dashboards light up when you go to access them. I will also not be hostage to any vendor when it comes to the monthly consumption bills. Operations get control.

Other examples are security examples. Don't send credit card data. I don't have to file a ticket and wait for devs to fix an app. I X-out that data with a quick mutating pipeline. That's not my favorite example because you could theoretically do it in OTel today. The problem with OTel is you need your devs to do it anyway. Operators don't know OTel. I'd like to have a marketplace of cartridges and pipelines that operators can grab and use as recipes.

On the developer side, it makes a big difference. Those teams said, “I want to write my own root cause analyzer, and I want a copy of Datadog's data.” I will provide you a copy of Datadog's data in OTel format while not perturbing the Datadog monitoring solution that your SREs use. With that format, you can say, “Datadog logs too much. I don't want it all. I want only these five signals from this one service.” You take those five signals and compute your anomalies.

The thing I love about it most, Ben, is that your audience is interested not only in open source but AI as well. Everyone is right now, but different services need different algorithms, which is my belief. We're not in a general-purpose LLM that's going to find the root cause magically on its own. We don't even know how to ask an LLM as a community or an ecosystem. What is the root cause?

I like not just an ensemble approach. I like the ability to say, “Different horses for different courses.” This one service uses ML. It looks at CPU signals. This other service CPU is not an indicator. I want to look at another signal, and I don't want to use ML. I want to use the neural net for this service on these five signals. For this other thing, I'd like to take it all and throw it in an LLM so that in the future, I've built up a library and a corpus. I can start to query it as to what it knows. Two weeks after a push to your database, these five services get perturbed and go down.

LLMs will figure that stuff out for us. We need to be able to slot in all these cartridges. We need to be like Geordy, plug in all these chips, and solve problems with pipelines that have never been thought of before. We cannot exfiltrate or egress all our data to a cloud vendor and wait for them to build all those plugin points for us. We need to move as an ecosystem in the community right now. Now is the time. People need control. Devs will have the ability to write and express any logic they want on this telemetry data, while ops have total control. This has never been seen before.

Operations get control. Developers get the ability to write and express any logic they want on this telemetry data, while ops have total control. This has never been seen before.

From an open-source point of view, how are you approaching the open-source aspects of the AI componentry? That always seems to me like to be an interesting dichotomy.

We are going to be a participant in our own ecosystem. I talked about a marketplace or cartridges you could download like apps. We intend to build a platform, provide full, high-value end-to-end solutions on that platform, and sell support in our early days. If you want to run it and you need a throat to choke, you can call us. We will be there 24/7. That matures over time. The community takes over and says, “I love this platform. It's the best encapsulation of value that we've seen in the telemetry space.” It's OTel-based. It's friendly to all of us. It's open and protected through the future by CNCF, but the value we provide is can solutions.

Perhaps we'll build LLMs in a box that will do root cause for you if you want to pay us. If you want to build your own root cause, go for it. We'll build cloud cost optimizers that tell you and do transfer learning that compares the difference between your Kubernetes cluster and someone else's running a Cassandra or a MongoDB cluster. We tell you, “Your Kubernetes cluster config has autoscale set by CPU at 20% utilization. All of your contemporaries in your same industry are at 60% utilization. You're wasting 40% of the box minimum.” I can build that cloud cost controller where you can't because you aren't in all those installations. You can't get that transfer data anonymized from those other contexts. The idea is I build an ecosystem and I participate in it.

I know you guys haven't launched yet, but can you give us any guidance on what to expect and roughly when?

We are out developing in public. You can find us on GitHub. Our website is a shell of a website. I don't know when this episode will go live, but by the time people check out our website, we will have revamped it. We're starting to move to market. It's spring. In summer, we're at full public beta. In 2024, we're at 1.0/GA of 0.9.

When I think of this stuff, it's not going to be a docker-compose-up, or maybe it is. My presumption for these sorts of services is that I'm going to need to give it a good half hour for things to get set up. How are you guys approaching that?

We've forked our efforts. We have what we call a local install and a cloud install. The local install isn't docker-compose. It uses a few more tools, but a couple of home brew calls and docker calls later. You've got it running locally. You can see the console. You can watch your open telemetry pipelines, mutate them, and control them. The local installation takes 60 seconds ish to live. Cloud Install is 30 or 40 minutes.

I don't know what generation your audience is from, but we used to have this toy called a weeble wobble that you couldn't knock over. It was a weighted little oblong egg-shaped thing with a big metal weight at the bottom. You tipped it, and it came back up. Our cloud install is auto-scaling what we call a weeble wobble. It's nonstop. You can shut down arbitrary pieces straight at the various AWS control consoles or even in Kubernetes. Our staff is monitoring its sidecar combinations of custom cluster controllers. It stands itself back up. Until your operator comes in and says, “Shut it down.” Your devs can't break it. That is the cloud install.

Before we finish, is there anything else that you want to mention that we haven't gone over yet that people might find interesting?

I'm happy to be working on this problem and the maturity of the ecosystem around me at this time. Observability is a big space. OpenTelemetry was finally here, and it GA’d in 2023. We are happy to rely on it. I'm appreciative to have the opportunity to talk to you. I'm amazed at how deep you are in the space. I enjoyed our time together.

One of the main reasons I even knew about it is because I've been working a lot on open features with a bunch of different companies. Dynatrace is one of the companies that's putting some effort behind the open feature, which has reflections of OpenTelemetry in a way, but it's slightly different in other ways. It's interesting because I'm on the dev side of the engineering thing. I want my dashboard and see my signal, but I don't care how it gets there. Give me the library and API key, and we're good. I've been learning a lot of stuff from a couple of Dynatrace folks. It's been interesting to see.

The term DevOps has become ubiquitous, but I still feel like there are two different camps or people. If you squint, you can tell whether someone is on the dev side or the op side. Occasionally, you meet people who are legitimately perfect amalgamations of those two worlds, but it's unusual. That's interesting to see. Thank you so much for your time. I'll start watching your GitHub for releases and see what happens in the summer.

It’s great to meet you. Thank you so much for your time.

Thanks again.


Ari Zilka

Mr. Ari Zilka serves as Chief Executive Officer at MyDecisive.AI. He serves as Board Member at Imply. He served as Partner at Khosla Ventures. He served as a Board Member at Koding. He focused on enterprise, infrastructure, cloud computing, data management, security, and programming frameworks and languages. Prior to joining Khosla Ventures, he was the chief technology officer at Hortonworks, the premier commercial vendor of Apache Hadoop, the de facto open-source platform for storing, managing, and processing big data. While there, he helped build the product management, sales, and services teams and led the company to ship its initial product.

He also brokered several initial partner integrations and designs. More recently, he worked closely with customers building multi-thousand node clusters and designing business solutions on the platform and with the founding architects driving new features into the open source core itself. He also helped shepherd the company's successful initial public offering. Previously, he founded Terracotta, a leading data management technology provider, which was later acquired by Software AG. He also served as the founding chief architect of, where he led the innovation and development of the company's new engineering initiatives and built a team of engineers focused on performance management, operations, and cost-saving measures.

Earlier in his career, he served as a consultant at Sapient and PricewaterhouseCoopers, where he managed technology development and advised clients on strategy and deployment. He had successful engagements with and, Harrod's of London, Siemens, Intel, Compaq, and Barnes & Noble, among many others. He began his career as a software engineer for a subsidiary of Motorola. He was responsible for much of the data management infrastructure behind wireless networks, and in the mid-1990s, he invented a new object-relational database that helped shape today's database technology landscape. Since then, his software development accomplishments include projects in statistical analysis and data warehousing.

Available for talk, coaching and workshops on:


Learn more about CI/CD, AB Testing and all that great stuff

We'll keep you up to date with the latest Flagsmith news.
Must be a valid email
Illustration Letter