Elementl/Dagster Interview | Craft of Open Source Podcast

Interview with Pete Hunt: CEO, Elementl/Dagster

Ben Rometsch

October 31, 2023

Ben Rometch

Host Interview

We try to move folks away from this notion of a black box task and towards software-defined assets.

Pete Hunt

CEO of Elementl and Dagster

Pete Hunt

CEO of Elementl and Dagster

Data is the name of the game in today’s world. But with the amount of data sources today, how do you sift through and get the data you want? A data pipeline is the answer, and within that pipeline is a data orchestrator. Today’s guest, Pete Hunt, is the CEO of Elementl, the company behind the open-source orchestration platform, Dagster. He joins Ben Rometsch to tell us all about Elementl and Dagster as well as his career journey that took him across Facebook, Instagram, Smyte, and Twitter. In this current fluctuating environment, we can say it is a feat for a company to be able to raise money. Just this year, Elementl was able to raise $33 million Series B for Dagster. Find out how they are able to achieve this, what they are doing for data orchestration, and where they are heading in the future. Tune in to this episode to not miss out!

---

Welcome, Pete Hunt. How are you?

I'm doing great. How are you doing, Ben?

Apart from the weather, I’m good. For some context, do you want to tell us where you are and who you are?

I am calling in from Boston, Massachusetts, in the United States. I spent several years in San Francisco working on a bunch of tech stuff. Throughout that journey, I found myself as CEO of Elementl, and we made an open-source data orchestrator called Dagster. That's where I spend most of my time these days.

You started a small open-source, front-end framework. You were involved in the beginnings of it.

I was a cofounder of the project. I didn't create it, but I was probably the third person to work on it and one of the people who drove the open-source adoption in the early days. React ended up being a little bit more successful.

I'm going to start with that because I'm curious. Who sits down and says, I'm going to start a new JavaScript framework?”

The instant that someone says that publicly, they get yelled at, dragged through the town square, and thrown in the river. If they float, they are executed. If they sink, they aren't a witch, but it's okay. The story there is that Facebook had this big complex ad creation user interface, where 90% of the revenue came through.

It was important. It could never break because the company revenue went to zero if it broke. It was constantly being iterated on because they were trying to make a lot of money. This was before or after the IPO. The lead-up to the IPO revenue was a big focus. After the IPO, the stock crashed. They're trying to figure out how to make money.

That team is under a lot of stress. They are iterating rapidly on this thing. What they find is they can't make changes to the UI without breaking. The front end is notorious because it's difficult to test. A lot of front-end bugs are like, “I click these buttons in this specific order, and the UI doesn't look right.” The problem with that is that by clicking the buttons in the right order, there are a bunch of different orders in which you can click the buttons. Figuring out which ones to test is hard. Writing test assertions that look right. That's not something computers are good at. That's something humans are good at.

What was that built-in originally? Was it a patchwork of everything?

It was built in a framework internally called Bolt JS, which was derived from Facebook. Do you remember this thing called webOS?

Yes.

Palm made this thing called webOS. Facebook had hired or acquired a good chunk of the team. They had taken the stuff they had built for webOS and evolved it into Bolt JS, which the ads creation flow was built on. In the end world, it's a small group of people that do everything.

I was working in an agency back then. I do remember using that ad platform. It is a bit like the Google one. It's complicated as hell. The state management for that application must have been ludicrous.

I'm creating my ad here in the US. I want to see how it renders in Arabic, which is a right-to-left language. We got to mirror this part of the UI that's rendering the ad, but not the rest of the UI. There are stuff that you might not think about, but it is incredibly complicated. It's one of the most complicated pieces of user interface on the whole site.

That was the genesis of it to try and solve that brittleness problem.

There was an engineer working on that team who was into functional programming and had this kernel of an idea that was like, “Can we make our user interfaces a function of the underlying state? Can we have a framework to figure out how to manage all the transitions for you?” If you think about how a traditional webpage works that's backed by a database, it’s like a server-rendered app. You do a couple of database queries, generate some HTML, and send it to the browser. When the user loads up a new webpage, that process starts again from scratch.

On the one hand, that's slow. You don't get that rich interactivity that you would get with a lot of JavaScript, but it's easy to write that code, and you know what it will do. If your page breaks, you hit the refresh button. The number of transitions you have to reason about is much lower. React is taking that programming model and trying to apply it to the front end. It doesn't work naively like that because if you were to refresh the whole page every time, you would lose your scroll position. It would be slow. If you were typing something in the text box, you would lose that.

There's a lot of work underneath the scenes and built this technology that we branded after the fact as virtual dom. That enabled that style of programming to be applied to the front end. I came in when that was an early experiment shipped to production. I shipped a second production application of React. There were a bunch of things that were broken and weren't working well. I fixed a lot of stuff. That's when I joined the project, and we were off to the races.

Was React novel in that functional approach? Were there any other frameworks approaching the problem in that direction?

No, it was a new idea. You could see some prior art in game development. There's immediate mode rendering and retain mode rendering. I'm not a game development expert. The last time I did it was in college. In one approach, you have these stateful objects that you update that's retained mode, and in another approach, you redraw the entire screen every frame. This is much more akin to that latter approach. That was a comparison that worked well for us in the early evangelism of the product, but it was novel when applied to the web for sure.

We're going to stop talking about this in a minute, but I'm curious to know what you attribute to its longevity. What's interesting is it's unusual for a framework to bring a new paradigm and not get superseded by some subsequent framework that's taken all the good ideas from that. There was a bit of stuff that, if you did it again, you'd have designed it differently. Why is that, do you think?

These things are a combination of quality, skill or competency, and a degree of luck. From the quality side, there are a couple of things working in Reacts favor. It was a good idea. It was allowed to grow in the organization. A lot of times, those ideas would be shut down early. There was a unique culture at Facebook at the time that allowed an idea like that to hold. The company continued to fund it but never tried to make money off of it. If it was started in 2023, they quit Facebook and started a venture back company to run it. That didn't happen, which helped get it to a place of real maturity without a lot of compromises.

One of the biggest ones was the React team had to maintain Facebook.com, which was a big production app that they couldn't break. If they wanted to introduce a breaking API change, the React team went and migrated everybody at the company to it. They would refactor other people's code. This is different than migrations that I've seen at other employers in other places where the core team that maintains whatever the platform service says, “We're moving from V1 to V2.” They nag all the other teams to upgrade their stuff.”

That creates this problem where those teams don't want to depend on internal services because they change all the time. The team maintaining those internal services doesn't feel the user pain because they can push V2 and let other people do the work to migrate. For React, it is different. That's why you get this technology that's gone through three huge changes, but they've all been backward compatible, and there have been transition paths. That's because that team had to pay that price. Those are the main factors that contributed to its longevity.

That's the end of my React questions. I’m curious about that. Thank you for working on the project. React native was transformative to the agency that I was running at the time because we never went down the road of hiring native, like Java and objective seed developers. When React native dropped, we were all over it and bet the company on that platform. We could have had the same conversation about React native.

React native is the most common way that our customers build mobile applications. It is unusual. Unless you're building a 100 million-user B2C application that needs incredible, super high fidelity, it's amazing. Thank you for that. I want to get onto data because I don't understand that area of the world. You had a fairly storied career working through Instagram, and Smyte got acquired by Twitter and Elementl. Do you want to talk a little bit about your experience with Smyte and how that ended up being acquired?

I was the first engineer to go from Facebook to Instagram when we acquired Instagram. We used React to build a lot of stuff for Instagram. The most important thing that we did for Instagram, and I wasn't directly involved in this project, I observed it, was plugging Instagram into the site integrity systems at Facebook. These are the stuff that find fake accounts. During a spam attack, it allows you to react to spam. If there is illegal content, it handles the proper process for that, which is taking it down or not and reporting it to law enforcement or not.

When you looked at Instagram pre-acquisition, there were all these hard-coded rules in there that were like, if your email address begins with Z and you sign up from Hotmail.com.au, we consider you a spammer. One time, in the middle of the night, someone was creating a bunch of fake accounts. They left that in or forgot to take it out. Some percentage of users were hitting that.

We were like, “You shouldn't have to sell your company to get access to this type of technology.” I got a small group together. We started a company to build what we called trust and safety as a service. This was way back in 2014 before trust and safety was a thing that people knew about. We had timed the market well. Over the intervening years, it became this high-profile thing. The category of our company was always small.

You shouldn't have to sell your company in order to get access to this type of technology.

We ended up starting that. These products were two things. There was a classifier where you would stream event data, and we would tell you this user ID is bad. This user ID is good, and the reasons why. There was a UI on top of it that let you explore your data. We ended up getting every marketplace and social network that wasn't owned by Facebook or Google. We had Musical.ly, which turned into TikTok and other companies like that.

We grew that thing quickly. We ended up selling to our second largest customer, which was Twitter. We joined the Twitter team. This thing was at a larger scale than ever, empowering Twitter's spam and abuse detection until a couple of months ago. If they did shut down their Google Cloud instance over there, we would've been part of that shutdown. I don't have any insider information anymore.

Elon Musk sent some tweets out saying that it was like, “Sunlight, apocryphal million lines of code to stop people with the first name Zed in Australia.” He did talk about that a little bit, which must have been a surreal experience for you. Working in trust and safety from 2014 to 2022 was a journey. A lot of people think how it works is there's a big machine learning model that learns patterns of behavior and then can proactively identify stuff.

There's a big team of human moderators that deal with a lot of stuff, but you don't have labeled data to train on your ML model. Because it is an adversarial space, the attack patterns change before you get the labeled data, and you can retrain the model. What happens is you have models that catch last week's stuff and keep that off the platform, but there's this whole category of emerging threats. That's where you get 100,000 handcrafted rules and regular expressions that get that stuff.

I don't know if you remember the RayBan spam attack, but it was all over Twitter for a while. There was, like, get a free pair of Raybans by going to this malicious link. There were teams of analysts behind the scenes that are trying to figure out the patterns they were using. We could catch that before the interim period between when the attack was going on and when the machine learning models were able to identify the patterns. I mostly spent my time on that stuff.

How did you get onto conceptualizing and starting a data platform?

The problem I described is a huge data problem. At Smyte, we had all these different customers sending us sensitive data that we were analyzing. It was that same problem at Twitter, but at a much larger scale and exposed to global regulators. You always have the Irish DPA coming in asking questions about whether we're using email addresses appropriately.

We are organizing that data and figuring out where it's coming from and where it's going. Are we using it correctly? Are we using it in a cost-effective way? These are all problems that organizations, both my tiny little startup and a larger company like Twitter, have the same problems. We would routinely stumble into savings hundreds of thousands of dollars by identifying data sets that were being recomputed all the time that nobody was using. We would get questions from lawyers all the time that say, “The Irish DPA is asking us this question. Can you answer it for us?” Where are the European email addresses coming from? Where are they coming from? Where are they being used?

COS 55 | Elementl — Elementl: We would routinely stumble into saving hundreds of thousands of dollars by just identifying data sets that were being recomputed all the time that nobody was using.

‍

Those questions were multiple teams coordinating for multiple weeks to answer. It was incredibly expensive. That's the time that we could have spent building something else. I empathize with this problem. When I was at Facebook, I was working on React while this other guy was sitting next to me, Nick Schrock, who was working on GraphQL. He started a company to solve this problem called Elementl. He started the Dagster project in 2018. I invested in the seed round. In 2022, he finally convinced me to come over, and I joined the company in 2022.

I'm interviewing one of the seed investors. That hasn't happened before. Dagster is an open-source project designed to solve the problem that you explained. Elementl is the company behind Dagster. You raised a bunch of money. This is unusual for the show. Normally, everyone I speak to hasn't raised anything in the last couple of years because of all of the fluctuations in that world, but you guys did. Can you talk a little bit about the environment and the world that data lives in? Are you competing with large proprietary platforms? Is it a new segment? Where does it sit within its life and world?

Let's set the table here a little bit for those who are not data experts. Starting with the business need and the actual outputs of these things. A lot of businesses are calling themselves data-driven. Many businesses are data-driven. Most businesses above a couple of people use data in some respect. Whether you're an executive looking at a dashboard or you are building a product like Netflix that has a machine learning model under the hood making recommendations, you're using data to deliver business value somehow.

Where does that data come from? It’s stored in a bunch of different source systems. You've got your eCommerce shop, inventory management system, enterprise ERP, payments provider, and web analytics. There are different places where data lives. In order to get from that stuff to that dashboard or that ML model, you make a thing called the data pipeline. Data Pipeline sucks in data from all these different sources, joins it together, transforms it, and produces whatever the output is. We call those inputs and outputs and intermediate artifacts data assets. It's producing, transforming, and consuming data assets.

There's a thing that runs those data pipelines. That's called a data orchestrator. That's what we do. The most simple version of an orchestrator is cron. We run this series of steps every hour. The ops team wants their finance dashboard up to date every hour. We run a cron job that runs it every hour. If you were to deploy cron, you would realize that there are a lot of problems with that. Steps can fail. Two numbers cannot add up in the final product. How do you test these things? If you're developing this thing, how do I develop it on my laptop and push it to production? How do I monitor it? How do I retry from failure?

‍

There's all sorts of stuff that goes from the Python script on your laptop to production and running on a schedule. It's a lot of work. That's what a data orchestrator does. Popular ones include Apache Airflow. That's the most common one that people have generally heard of if they haven't heard of us. I can jump into how we're different. I forget if that was part of the question, but that's the category that we play in.

I was curious about the ecosystem because it's an interesting segment or part of the industry because it's not visible. If I use Twitter, I comprehend that they have a shitload of data in probably doing stuff with it. People don't tend to write blog posts about their data pipelines, not in my experience anyway.

If you're used to the front-end world, there is a thing called data Twitter, and there are blog posts, but it's small compared to the front-end world where there are hundreds of thousands or millions of people in the conversation, and in data Twitter, it's much smaller.

Dagster is different from the airflow in what ways?

Airflow created the category. They were the first ones to see there's life beyond cron, created several years ago. It's based on a thing on Facebook called Data Swarm. I overlapped with the guy who created it back then. It's a weird coincidence. The year that an open-source project gets product market fit and lots of traction is the year that it feels. There are certain assumptions baked into the airflow that, in hindsight, are the wrong assumptions to make. A good example is you only get one Python environment across your whole company.

Part of the dependencies are lovely to work with at the best of times.

Front-end people complain about NPM, but they have not seen the whole.

A community contributed PR to Flagsmith. That moves from PIP tools to poetry. Everyone is a little bit waiting for the bomb to go off where something complicated breaks and we can't figure out why. I can see how one part of the environment is not going to get you far in a large organization.

There are a bunch of decisions that are similar to that back at Airbnb, where it was created in 2013. It made sense, or it worked, but now it doesn't. Developers resort to things like building separate docker containers for every step. They don't run anything in the process. They orchestrate Kubernetes jobs. That introduces a whole slow, complicated docker build pipeline every time you want to make a change. You can't write tests for these things. You can't develop locally. Staging is a complicated process. The state of the art of observability in 2013 was not the state of the art of observability now.

Dagster originally started to solve those problems. As we started to solve those problems, we realized that there was a missing core abstraction that unified a lot of stuff. In airflow and other orchestrators, it's a series of tasks. These are functions, shell scripts, or containers that run. They do something, and they either succeed or fail. That's the contract that those have with the orchestrator.

The orchestrator can tell you this step failed. It can even restart that for you, but it can't tell you why it failed. It can't tell you where the data is. It can't tell you any of the dependencies between things. Oftentimes, you'll have two tasks. You set up one dependency in the orchestrator that says run task A and task B. Inside the code for each of those, which runs in separate containers, one writes a file to some known URL and reads it back out. If that breaks, the reasoning about that is challenging.

These are things that could be running for hours before they fail.

They cost a ton of money to run. I could rant about this for a long time. A good example is when we were talking to a user of another orchestrator. Somebody kicked off a backfill, which is rerunning a job, a pipeline for Oracle, in the last several months of data. There was a bug in their code where they did a nested subquery in their data warehouse. They spent $500,000 on Memorial Day weekend. They didn't catch it until it was that late. It should have been much less than that. There are issues like that where the lack of visibility into what these jobs are doing can hurt you.

I recall the moment in Flagsmith when we realized that we were never going to be able to work on the production database locally again because it was too big. It always felt bad doing it. There's that moment where the organization or the environment you're working in is never going to get 100% perfection of data. For people who are reading, it's not a lot of massive data set, but leaving that one world and wanting to jump into the next one, there's nothing there, tooling-wise, to help you. That's what the world was like pre-Dagster.

Branching is another challenge. If you put up a pull request, how do you know if it works in production? What makes this harder in data is that oftentimes, you're developing machine learning models. That's a common data pipeline. Half of Dagster users are using Dagster for that. You can't make test data for that because the output is a statistical model that is dependent on the statistical properties and unknown statistical properties of the underlying dataset. You have to test on production. How do you do that in a way that is safe?

Having black box tasks means you have to refactor every step in your pipeline, which may have been written in different languages by different teams. There are not a lot of shared best practices in order to support that type of behavior. This speaks to that missing abstraction that I'm talking about. We try to move folks away from this notion of a black box task towards what we call software-defined assets.

We try to move folks away from this notion of a black box task and towards software-defined assets.

You develop individual data assets as Python code or as Python code that bridges to another environment like Databricks or DBT. That follows a set of conventions and best practices that enable things like branching, testing, lineage, and observability. You can always see this table and the data warehouse. Where did it come from? When was it last updated? Who's responsible for it?

How does the relationship between Elementl and Dagster work as a business model?

We are a venture-backed open-source company, which is Hacker News's favorite thing to hate. It is a topic that we could dig into about because it is a fairly hot topic now. I’m going to tell you maybe why we're one of the good ones or at least why a lot of the problems that other companies run into are not a problem with us.

Let me start with the business model. First of all, we're an open-source-led business. We have Dagster, which is an open-source project. Many of our biggest logos run Dagster open source. They don't pay us anything. In fact, they cost us a little money because they come into our Slack channel asking questions all the time. It's not this toy version of Dagster. It's the real thing. It is important, especially for a startup at our size and scale. We can credibly go to our commercial customers and say, “Sign up for a commercial product. If you don't like it, you can leave.” You go back to open source, and you can continue to run these critical bit pipelines. It is infrastructure.

We have the same thing in place. It is good to say it as part of a sales cycle because you can export the data. We'll remove your account from the enterprise image. You can run the open-source one. You're still good to go. It solves many problems. How are we going to do code escrow? It's all on GitHub.

It's not an issue. That is a great alignment of incentives if you can make a commercial product work on top of that. The other reason why it's good is because a lot of people try the open source project. It can get a lot of traction. It's like $0 customer acquisition cost. They come in and say, “I'm interested in your commercial product. We didn't spend any marketing money to get that.” It is a great alignment of incentives.

The next question is, if the open-source project is great, why would anybody pay you? We have a self-serve cloud product and an enterprise cloud product. Both of them have hybrid or fully hosted options, what we call serverless. We'll host everything for you in a SaaS model. It's like Versal, where you install the GitHub app. Every time you push to the master branch, it rolls out on our infrastructure.

We also have a hybrid deployment model, which is a little bit unique to this category, where customers can run an open-source agent in their infrastructure. It exchanges logs and metadata with our cloud service. We don't have the secrets to their internal databases or data warehouse. We do host complicated stateful databases. We have an enterprise product.

Our cloud product is like host stuff for the user. Second, it has team features that they would find valuable. It's a possible and open source to deliver that branching feature in an out-of-the-box way. You can use Dagster’s conventions and build your own configuration management on top to manage your branching. In Dagster Cloud, because we have a GitHub app, we can do that branching for you magically.

You don't have to change your code between those two, but in order to get that super sweet GitHub experience where the bot comments on your PR, that's the thing that you pay us a little bit of money for. On enterprise, there are other things like integration with your SAML-based single sign-on, your roles-based access control, your audit trails, and all that stuff that enterprises want and need. Oftentimes, they want somebody to pay and somebody to give them an SLA.

One of the reasons why our model is working, and other venture-backed open source companies have struggled, is because Dagster was started from day one as an independent project and as a project that was meant to be commercialized. It was not a situation where you have an existing open-source project that was incubated by a big company. It got a lot of traction. Users had a lot of expectations.

In order to make a commercial entity work on top, there had to be some real pain inflicted on the open-source users. We had some principles behind it from day one. We're not seeing backlash from the community in terms of launching a commercial offering on top. We haven't made the commercial or the open-source offering worse. We make it better over time.

‍

I don’t know if anyone has said this to you before, but there's quite a lot of similarity between Dagster, Elementl, and Pulumi in some ways. Pulumi is interesting because it wraps Terraform, as far as I understand it, to a large degree, but it provides staple management on their service. It takes stuff that's declaratively defined in Yammer, and you write it in Python or type script. It's interesting. We use Pulumi. It feels like there's quite a few similarities there.

I've never heard it explained like that, but I completely agree with you on that. We always thought that Flagsmith would be a commercial entity. It was an open-source project. There wasn't a legal organization behind it for several years. We're always making the open-source thing better. We've never taken something out of the open-source product. You guys have written stuff like SAML providers and haven't made those public because we're trying to build sustainable projects and businesses. There's a number of things going on here, and there was too much money. There was way too much money getting sprayed around.

I'm not saying there is anything wrong with this, but half the stands. There seem to be people trying to build a commercial business on top of Yeager or some open-source project. That's where the friction starts to come in because they have to bend something in a way to try and have some leverage of financial aspects to it into a project that was never had in mind.

You have to have a vision for both the open source and the commercial offering either at the same time or the open source project gets a lot of traction. You notice that there's a missing piece that could be commercialized. One of the ways that people get in trouble is when something gets massive adoption. They're like, “We need to find a way to commercialize this thing.” They don't see that missing piece in the ecosystem. They instead try to create it. That gets people in a lot of trouble.

You see the Hacker News title. Before you click on the comment, you know that the first comment is going to be people ranting. Justification is the term of the day. That might date this show in stone somewhere. Do you think people raising too much money was a large part of the problem?

We raised a bunch of money earlier in 2023, but it was reasonable to always be around. If you took a median over the last several years, we're in a band for a reasonable one. A couple of years ago, they were high. We could talk about the incentives at play here, but what happens is capital markets move around. Money can be cheaper or more expensive. When it is cheap, there's a strong incentive for companies to raise a lot. People wonder why that is.

More money means you can grow faster and build a better business. It's good advice for founders and management teams to be like, “If the money is there at a good price, take it because it might not be there tomorrow.” There's another incentive here. There were a lot of silly secondary transactions happening in 2020 and 2021 where founders and employees could sell shares to the investors and pocket a lot of cash, which is another personal incentive.

If the money is there at a good price, take it because it might not be there tomorrow.

I didn't know that was going on.

It happens during during bubbles. If you go and ask yourself, “Why did companies so and so raise tons and tons of money? That's irresponsible.” Everyone's got a price to take that risk. They cleared that risk. It creates this incentive where you have to grow into that valuation within a certain period of time.

What happens if you miss that curve? That's the question.

It depends. A lot of it has to do with the story. You can raise all the money, and you cannot spend it. If your board is cool with that, and you can explain to a future investor why you did that, it's a good idea. The money is there. You took it. The company is safe for a long time. We're going to still take our time. The problem is that you get a lot of pressure.

The board is usually like, “I deployed my capital into this company instead of ten other options I had. I want you to grow.” The future investor will be like, “Why did it take you so long to grow?” There's a strong incentive to grow the commercial business quickly. They say, “We need to be at this many millions of ARR by this date, working backward. I have to generate four times that in the pipeline. That's when you start seeing crazy email campaigns and upsells.

There's a ticking clock. There's going to be a bunch of destitute open-source projects. If you overraised, you didn't realize it at the time. You felt like you weren't in the cold light of 2023. You don't hit that trajectory of what happens then. Every now and then, I'll look at the RethinkDB repository because I love that product. I thought it was interesting and innovative. Maybe that's the archetype of what's going to happen to a lot of these projects. What do you think?

I was thinking of React. Smyte was built on React for a long time. That was quite the journey. There are ways to fix companies. They do these things called recaps, where the company gives the investors some money back, and the valuation goes down. When the valuation goes down, the expectations for future revenue go down.

You can correct these mistakes. The problem is if your equity is worth a lot on paper and you got a loan secured by that stock. There's a certain amount of pain that companies go through when they have to do a recap. I personally know 3 or 4 that have done recaps. They usually keep it quiet when they do it. The point I'm trying to make is a lot of these projects that would've been risky are getting fixed.

What does the future hold for Dagster and Elementl in the context of that discussion? Where do you see both the projects and the business in several years?

I have a lot of principles around this. This is not my first time being a CEO and shepherding an open-source project. I like to think that I'm going to make a whole new set of mistakes this time around. Let me tell you how we're approaching it. First of all, I've got this philosophy of don't get big in terms of headcount.

‍

Why don't people like working at large companies? It's because they get big. The culture or communication gets messed up. There's not enough stuff for people to do. For some reason, they're employed there. They are bored. They start doing useless work and building systems that the people don't need. You have to maintain systems that shouldn't exist. There is something to be said about keeping headcount, especially R&D headcount. We have great cloud, open source, co-pilot, and other AI technologies. Velocity per engineer should be higher now.

The second is we don't want to market something and get a lot of usage before it's ready. With React, it baked at Facebook for a long time before it came out. With Dagster, we did a lot of experimentation in the open back in the day. When we went to 1.0, that's when we listened. Backward compatibility is important. We feel confident that we have a product-market fit here. We didn't start evangelizing people to use Dagster until we were at 1.0 until we had a GA product. We felt like it was stable and not going to change under people's feet. There's stuff like that we're a little bit more conservative than maybe other companies on.

We haven't raised any money. We have a low headcount because we have to. I'm constantly not amazed but pleased or proud of what you can achieve with a small team. It's crazy. There are projects like Terraform. You need thousands of hours of time to do that thing and build a scalable platform. You can do amazing things with a small team. We've stuck to that mantra part out of necessity. The worst thing that we could do now is to have three times the number of people working in engineering. It would be a catastrophe.

Being able to absorb all those people is hard.

In terms of the project itself, are there any things that you're working on at the moment or thinking in the back of your mind that you want to get out that you want to mention now?

We have a blog post out there called The Dagster Master Plan. That's our roadmap. We try to be open about that. We have a big surprise coming at the end of this quarter or the beginning of the next quarter of 2023. We're at this great phase now. We're growing quickly, but we're being thoughtful about how we do it. We raised a nice healthy size series B. We're spending it, but we're spending it slowly. We don't want to be the cheapest or easiest product. We want to be the best product. Over time, the best product gets easier to use, and the tent gets bigger. We've built the best product, and we're working on making it easier to use,

Pete, thanks so much for your time. Thank you for React. When we have enough data to work with, we'll check out your platform. Thanks again for your time.

Thanks for having me.
‍

Important Links

About

Pete Hunt

Pete joined Dagster Labs as head of engineering in early 2022, and took over the reins as CEO in November of that year. Pete was previously co-founder and CEO of Smyte, an anti-abuse provider that was acquired by Twitter. Prior to this, Pete led Instagram’s web and business analytics teams, and cofounded Facebook’s React.js open-source project.

Available for talk, coaching and workshops on:

Links from the Episode

No items found.

Success!

We'll keep you up to date with the latest Flagsmith news.

Must be a valid email

Elementl/Dagster Interview | Craft of Open Source Podcast

Important Links

Links from the Episode

Subscribe