Typesense Interview | The Craft of Open Source

Interview with Jason Bosco: Co-Founder, Typesense, Typesense

Ben Rometsch

September 9, 2021

Ben Rometch

Host Interview

I can confidently say that C++ is not going anywhere. It's still around and people are using it!

Jason Bosco

Co-Founder, Typesense

Jason Bosco

Co-Founder, Typesense

Check out our open-source Feature Flagging system - Flagsmith on Github! I'd appreciate your feedback ❤️

Episode Transcript

I’m excited to introduce Jason Bosco from Typesense. Jason, it’s great to meet you. Tell us about yourself and your projects and company.

Thanks for having me, Ben. I’m Jason. I’m Cofounder of Typesense. We’re an open-source search engine that’s optimized for both performance and ease of use. A simple way to describe Typesense is it’s an open-source alternative toAlgolia and an easier to use alternative to Elasticsearch. We’ve been working on it since 2015. It’s my cofounder, Kishore Nallan and myself. It started out as a nights and weekends project. We’ve been working on it and open-sourced it in 2018. In 2020, we posted it on Hacker News and that got a lot of attention and people were interested and were giving us feedback so that was motivating. In 2020, I left my full-time job and started working on Typesense full-time. Kishore left his full-time job and he’s working full-time as well so we’re excited about that.

{{divider}}‍

Congratulations. You’ve both taken the job. It’s interesting. I’m looking at your GitHub repository and it’s unusual to see. The language is at 98% C++ and others at 2%. I can’t think I’ve ever seen anything like that. That’s old school.

It’s by choice. If we want to optimize for performance, we have to get as low-level as possible. It’s short of writing in assembly. Over the years, C++ has become modern enough for us to still be productive. I will also say that C++ has quite a mature ecosystem around it and has been battle-tested in different production settings at scale. We’re able to take advantage of that. Two examples of that would be the HTTP server that we’re using is called H2O and that’s being used by Fastly in production on all their ad servers. The raft library that we’re using for a clustered setting is being used by Baidu in production and they’ve open-sourced it.

We’re able to leverage the best in class and battle-tested production-grade libraries out there for doing different things. Let’s focus on the search pieces, which is what we want to optimize Typesense for. Building libraries is one thing, using them and integrating them still takes a lot of work. Not having to deal with edge cases and all these different things that come up when you’re writing an HTTP server or a raft library yourself. Those are large efforts and I’m glad that we’re standing on the shoulders of giants out here on that front.

{{divider}}‍

I learned C++ at university and I wrote it a little bit professionally. That’s how old I am. It’s weird because I’ve never written any Rust, but I probably know more about the Rust ecosystem and packaging than I do about the C++. It gets no attention in any of the online circles that I frequent, which is odd.

Rust is doing interesting things in trying to modernize systems programming using a new programming language. Rust as an ecosystem has to mature so much. When I say mature, I’m talking more about community support and all of the libraries and using it in production at scale and ironing out all of the issues. That’s going to take a while. For something as critical as a search engine, at the time, Rust wasn’t too popular when we started out with Typesense. Using something that’s a little too on the bleeding edge will invite its own issues. For us, we chose stability and familiarity in the ecosystem when we started out.

{{divider}}‍

Rust was super young in 2015, wasn’t it?

I don’t think I heard about Rust even then. I might have heard about it first in 2018 or something. In any case, I’m happy that we’ve chosen C++ and I’m hoping that using C++ also shows that it is possible to use C++ and it has become modern enough. I can confidently say that C++ is not going anywhere. It’s still around and people are using it. People might think that it’s a too low level of a language but there are enough libraries out there that can be quite productive in C++. It depends on the type of project you’re working on. I probably wouldn’t build a web app with C++. That’s not the goal of C++. If it’s low-level systems programming, then C++ is still a great choice. Maybe one of the few robust choices out there for a programming language.

{{divider}}‍

What was the spark that caused you guys to start a GitHub repository and have a go at this? It’s not a simple problem either, is it?

It is not simple, which we’ve learned over the years when we started out. My cofounder, Kishore, was trying to set up Elasticsearch in one of his previous jobs. The effort involved in setting up Elasticsearch was so vast. I’ve used Elasticsearch in the past. You just accept that this is the best tool you get in terms of deployment complexity and roll with a budget, essentially. At the time, we were like, “Why is this so complicated? Why can’t it be simple? Why aren’t search engines easy to deploy and also easy to integrate with?” That’s when it started out as, “What would it take to build our own search engine? How hard can this be?” It is a lot of work but with that, I would say a little bit of naivety is what sparked this journey.

We started reading up on different ways and papers on how to build and what goes into a core search algorithm. We had something that was pretty useful for a small use case for us in the beginning. We slowly started adding features, then we open-sourced it in 2018 thinking maybe more people might find it interesting. At that time, we said that it’s going to be a super lightweight engine to quickly do a text-based search. That was the initial goal of the project when we open-sourced it. Even in 2018, we had a little bit of interest.

People were like, “This is interesting,” and were starting to use it. By 2019, people were using it in production already. That was encouraging. Over the years, people kept asking for more and more features that existed in other search engines. We’ve taken a careful approach of saying no to certain features to keep the rein and the complexity. Also, when we choose to implement certain features, we decided to rethink everything from the ground up to say, “If no other search engine did this now and we were the first to implement it, how would we implement it?” We went back to the first principles and re-architect them every day.

That’s where we realized that there is an opportunity for making things simple. Even if we’re implementing features that other search engines have like Elasticsearch, there’s still an opportunity for us to simplify things and implement them in the most intuitive way. If someone’s looking at an API call, they quickly understand what’s going on without having to pour through the docks to understand every single parameter.

What we’ve been laser-focused on since then is simplicity and intuitiveness in the API. We set scene defaults for everything and performance. We didn’t want to sacrifice the performance. To answer your question, it’s one that came out of our own personal frustrations with Elasticsearch at the time and built on top of people asking us for interesting features of how we were using it. The product is taking the shape of our users who are using it and asking us for interesting things to layer on top.

{{divider}}‍

Were you hacking away at it between 2015 and 2018? Did you have an idea about it being a business or an open-source project?

We’ve figured at some point we might want to monetize it. We didn’t know-how and open-source business models weren’t too popular back then.

{{divider}}‍

When did Algolia launch?

In 2012. Interestingly, I hadn’t heard about Algolia in 2015. It turns out that Algolia is solving a similar problem with Elasticsearch in the equation. They’re trying to make it a super easy and intuitive search experience and it’s a fantastic product, but they’re building it in a proprietary way. It’s a close source. When we started out, I didn’t realize that Algolia was a thing in 2015. Over the years, I did learn about Algolia and we weren’t even intending to be a product that’s similar to Algolia. In 2020, we pitched it as an open-source alternative to Algolia because we had some features that were similar. That resonated with people a lot because over the years Algolia has become also pretty expensive.

{{divider}}‍

I use it for another business I’m involved in. It is expensive to the point where I kicked the tires of a bunch of open-source projects a couple of years ago where they changed their pricing structure and I was like, “This is nuts.” We don’t use it that much. The service that it provides the business is super valuable. My opinion of it was that it’s too expensive but I do like it. It’s a great product and we do pay for it.

I can’t disagree with that. I’ve also used Algolia in the past and it is a fantastic product. They’ve thought about every little thing. There are some quirks as well like with Zorks. You have to create duplicate indices and things of that order, which we’ve solved in Typesense. I would say in general, Algolia is fantastic. Pricing is one of those things that people have come to accept over the years. It’s a lot of value for what they’re giving out. If you look at it from the perspective of, “If I already set up Elasticsearch versus having to pay a little bit of a premium to Algolia,” if you look at it from that lens, then maybe the cost is acceptable.

As a search service now, if you’re starting out, there are options like Typesense that make you question the actual value that we’re getting from paying a premium to Algolia. When we launched Typesense Cloud, based on users asking us for a hosted version, I wanted to make sure that we’re not heading down the same path of pricing as Algolia. The reason AIgolia becomes expensive is that they do a per record pricing and a per operation pricing. Whereas with Typesense Cloud, we charge more like an infrastructure provider. That’s how I see Typesense work as part of the core infrastructure for teams. We charge hourly so you get to pick your RAM, CPU, and whether you want a highly available cluster or if you want a geo-distributed cluster.

Based on these configurations, we give you an hourly price and charge for bandwidth. It’s similar to an AWS or GCP. What this allows you to do is you can put as many records as in as you want. If you want to add more data, you increase your RAM. If you want to support more traffic, you increase your number of CPUs. We can scale the traffic or a number of search operations or any indexing operations and the number of records you have independent of each other. We don’t charge by per operation. It’s not too dissimilar if you were to run the actions yourself.

The cost model is similar to what you would pay an infrastructure provider. That gives users a lot of savings over having to pay per record or per search pricing. I’ve heard feedback from folks that do make the pricing a little harder to understand and project out because it’s not as simple as, “Here are how many records I have. Here are how many search operations I can do.” It takes a little bit of calculations that we have to help people work through. That’s the tradeoff between saving money and a little bit of complexity around understanding the pricing. It’s easier to educate people about the pricing structure than charging a premium to make the pricing simple.

{{divider}}‍

Tell me a bit about that. How did you come to the decision to open source everything?

At the time, we wanted feedback from the community. Having a closed source product and inviting feedback, especially when it’s a core developer tool would have been a tall order. Maybe people could sign up and push some data and give us feedback around benchmarks. We were looking for something more and a stronger partnership with the community. That’s where it started but over the years, what I realized is, having it open source makes it more accessible to more people versus a SaaS product.

Once you put it as a close-source SaaS product, people quickly think of it as a commercial-only product. I would say one of our goals is to democratize access to instant search technology, which is what we call Typesense as being able to enter a key and get results on every keypress. That’s what we’re aiming for. If our goal is to democratize access to instant search technology, open-source goes hand in hand with that. Once you go SaaS, you have to charge unless you pay your own. If you give out too generous of a fee, you’re essentially bearing the cost at that point.

Having it be self-hostable instead is where it plays to our goals of making it as accessible as possible to teams of different sizes, whether you can pay or not. In the worst case, you have to run it on your own local machine and give us feedback. That’s the type of partnership we wanted with the community as we build up the product. That’s one of the reasons that we’re happy that we decided to open source back in the day. I’m continuing to see the benefits of that. People coming in, trying it out and giving us feedback. Even if that’s a feature request or pointing out that is an issue, all of this is a good partnership and open source is what unlocks all of that. Collectively, everyone and together with us, we’re helping democratize access to good and easy-to-use search technology. From that perspective, I’m happy that we open source our work.

{{divider}}‍

What happened when you pushed everything to GitHub? Was it crickets, 1,000 stars straight away or somewhere in between?

It was crickets, initially. At the time, we did another Hacker News. We first did a Hacker News post in 2018 or 2019. It didn’t get too much traction but we did land on the front page, we had about 80 points. There was some little bit of interest so that’s what sparked it. We probably got 200 stars on that day and it slowly was growing maybe five stars every two weeks or so. As we kept adding more features, more people were interested in it.

In 2020, when we did our second Hacker News post was when we started seeing explosive visibility. The happy coincidence that put us on a collision path with Algolia is that they have been raising pricing over the years. Also to their credit, they’ve achieved incredible distribution. Every developer documentation site is probably using Algolia for their documentation search. People are generally familiar with Algolia and given that, combined with them increasing their prices over the years has enough of unhappy folks who wouldn’t mind exploring alternatives, more so an open-source alternative. That’s the happy collision path that helped us gain a lot of visibility in 2020. It’s been a slow growth initially. 2020 is when we exploded. We’re happy that happened.

{{divider}}‍

There are 3 or 4 other fairly well-established open-source projects that are aiming to solve the same problems. Do you chat with those guys?

We talk on Twitter and treads. The most popular one at this time at least is MeiliSearch. They are a Rust-based product. All the things that I mentioned about the Rust ecosystem are, unfortunately, something that MeiliSearch is going through. Rust does not yet have a clustering library, which means that they have to build their own clustering library. This means that until they do, they don’t have a highly available solution so you can only run a single node instance of MeiliSearch.

I read one of their interviews. They had to go through even pick something as seemingly simple as an HTTP server. They had to go through so much due diligence to finally pick what they’re using and they still might have some issues with it. The unfortunate side effect of all of this, regardless of what library you use, your product is your product at the end of the day. If your product is running into issues even if it’s because of third-party libraries that still reflects on your product from a user’s perspective.

They have some scaling issues that they’re working through. They’re going to rewrite that at the moment. Hopefully, let’s see if someone from the community comes up with a raft library for Rust. Writing a raft library in an academic setting is one thing. You can cover all the cases but then running it in production like one time, we found an issue with the raft library that we’re using that we had to report Upstream and they quickly patched it, thankfully. We figured that out only because one of our users was using it at an incredibly high load and it only manifests at super high load.

I’m glad we found that out. We were able to work with Upstream to get it fixed. We’ve uncovered a few more things at scale. Things like this, you have to use it in production to be able to understand well. Looking back, I’m glad we didn’t use Rust or decide to even rewrite everything in Rust. It’s an interesting ecosystem and there’s a lot of enthusiasm around it. For building a stable product that can be used on production now, I am personally a little hesitant at the moment.

{{divider}}‍

I can imagine trying to build a business behind a clustering library that’s version 0.1 with an SLA that people are paying you for. It’s the sort of thing that you’re going to wake up in the middle of the night and run down the road.

That would not have been fun.

{{divider}}‍

That’s why I find things interesting because you never quite know what the impetus for writing Meili was. It might have been someone who was like, “I want to learn Rust and I’ll try and write a search engine,” and then it turns out it’s quite good. We’ve built Flagsmiths and it’s super boring. It’s Django, Python and Postgre. There’s no whiz-bang stuff but it’s bombproof. It’s well understood. It’s interesting as well because you’re one of the few people I’ve had on the show who’s got a product that has to be fast and scales well. That and the search quality are the things that people care about. All of those things are technically super hard.

If you think about it, it’s a difficult startup product. You’re aiming for an MVP and you definitely start saying that you don’t need to worry about scale, just get your first set of users, YAGNI and all of these things. Building a core infrastructure product that will be used by other developers to build on top of, there’s no sacrificing on things like performance, especially. For a search engine, if it’s not going to return results as fast or faster than Elasticsearch, then people will just use Elasticsearch. We have to prioritize performance and also ease of use because this is what we’re banking on. That’s the primary thing that we’re saying is different with Typesense. With every release, we go through extensive benchmarking. We have a benchmarking harness and I use k6, which is another open-source library for load testing.

{{divider}}‍

They’ve been on the show as well.

I remember seeing that episode.

{{divider}}‍

I remember them well because of the alligator.

I use k6 and that’s how I found out about your show. I was reading up on k6. I’ve been using them for a while and I was like, “Who are these guys?” I saw them on your show and that’s how I learned about this.

{{divider}}‍

That was a fun episode.

I use k6 and with every single release, we benchmark. We have a set of data sets that we run the changes through. I’m glad we have something like a benchmarking harness. The seemingly small changes at the time that didn’t seem a big deal has surprised us time and time again as to how big of a performance impact it can have. Even at a ten million records scale, that becomes a significant thing if you end up even adding a millisecond delay in your search operation. That blows up your life. It’s not going to be any faster.

We’ve had to normalize performance. Performance and ease of use is our MVP. If we mess up setting a scene default for any of the API settings, that’s going to trip up some users and they’re going to be scratching their heads. They have to consult the documentation. The two things that take the majority of our time is making sure that the changes that we’re doing still continue to keep the product performance and it’s not a head-scratcher for anyone trying a new product for the first time.

{{divider}}‍

I assume that you’ve got a fairly large number of people that are self-hosting the platform. How do you deal with this issue? This happens to us from time to time where you get this obscure issue. We get on GitHub issue or on our web chat where they’re asking something that is going to take you hours to figure out if it’s something weird. How do you deal with that? Your time is finite. How did you decide how much time to give to that for free?

The way I look at it is that if a user is reporting an issue that they run into, I look at it as an opportunity for us to make something better in the product for someone else in the future so that more people don’t run into the same issues. Even if it’s as simple as something that’s missing in the documentation, which we’ve had a couple of times, people are like, “How do I do this?” How do I do that?” It seems simple to us. It’s like, “Go read this little section and the documentation and you’ll figure it out. We could have said that, but that points out an issue that the documentation is not clear enough and that we need to go improve the documentation.

A case in point is someone who asked us, “Don’t you guys support filtering of records?” In my opinion, filtering is something basic that a search engine would offer. When I say filtering, it’s filtered by color in a products catalog like structured filtering. We supported it but the fact that someone asked us this question, as basic as it is, pointed to us that filtering is now a little parameter under the search section. You have to read through all the other parameters to understand, “This is how you filter.”

Another person was asking us, “Do you support aggregations?” If you’re building a filter widget that shows counts next to all the different values for the filters that you can select, that requires aggregations and faceting by default. The fact that someone asked us, “Do you support aggregations?” Points out that, “That’s not yet clear in the documentation.” I’ve had things like this over and over. People are actually asking us questions. It’s almost like a nice little UX experiment in how good our documentation is on one side and how intuitive things are.

From that perspective, I have spent a lot of our finitely available time on seemingly simple things. I’ve always found the return on investment on that good. Who knows how many feature requests that we’ve solved by that. At the end of the day, it also helps build a good community where all types of questions are welcome. That’s the type of ecosystem that I want to set up around Typesense. No question is too simple or too redundant to ask. I’m happy to answer as many questions as I can. Of course, when we do hit that limit where we can’t make any progress at all on the product, and we’re answering questions all day long, we probably have to draw a line somewhere. At the moment, I enjoyed talking to folks asking us different questions on different things.

{{divider}}‍

How do you feel the decision of choosing C++ has affected contributions and stuff?

It definitely affects contributions to the core search engine in terms of code that’s being written as contributions. That said, I would say it hasn’t stopped. I look at contributions like even someone asking for a feature request or reporting an issue is a contribution. Also, making a comment on something like, “This API structure is not clear or this parameter is not clear.” All of these are contributions. I would say code contributions have been an issue because the C++ community is not too large or popular or trendy.

C++ programmers tend to typically work on embedded systems, more on the hardware side and not on web things. We haven’t had too many core contributions. We do have an ecosystem of libraries like the JavaScript library, PHP library, Ruby, Python and Go. Both the PHP and Go libraries were community contributions that we ended up adopting over time. People have been asking for a JavaScript library and someone from the community has jumped in to help out. I’m grateful for folks contributing to the ecosystem. Even if someone wants to learn C++, explore and wants to work with us, I’d be delighted as well.

{{divider}}‍

I’m imagining you and the Meili guys on Twitter. You’re making jokes about their compile times and then your memory handling.

Not to that level yet but we do come up often when users are looking at alternatives to search engines, either Elasticsearch or Algolia. People stumble on one or the other and discover each other. MeiliSearch, at one point, had a page on their documentation that said, “Typesense is a lightweight engine suited for small projects.” I didn’t realize this and it was probably out there for more than a year. One user pointed that out to me and I was a little offended by that.

I put an issue on GitHub but it was a good outcome. I explained to them, “People are using us in production at scale with tens of millions of records, etc.” For now, they’ve removed that and said, “We’re going to re-evaluate.” Since then, I’ve started adding a comparison between Algolia, Elasticsearch, MeiliSearch and Typesense on our site so people can make more informed decisions for themselves versus going off. I could also be biased in this but I kept it not a commentary but feature support. Many folks have said that’s something that’s been useful to them when searching for which direction to head.

{{divider}}‍

Do you have any idea about how many people are using the platform self-hosted? Do you have any telemetry?

We do not have telemetry. We only go off of GitHub stars which we’re at 5,400 now. On Docker, we track the number of docker-pools. We don’t use Docker internally, just for basic testing every now and then. All of those docker pools are public pools of users pooling it. In that, we’re at about 32,000. That’s our primary tracking mechanism. We’ve pondered about adding telemetry to the self-hosted version. Though that seems to be the trend, there are enough privacy implications there. We’ve held off from doing anything of that sort. Only in October of 2020, I pinned an issue on GitHub, saying, “If you’re using self-hosted, I know at least 28,000 folks are using it somewhere, tell us if you’re using it on production so others know that this is being used and we can track progress.” Some folks have contributed but maybe if anyone is using Typesense in production, please add a PR to the showcase so we know that you’re using it.

{{divider}}‍

That’s one of those frustrating things. We added simple telemetry to our API that you can opt-out of so it sends a heartbeat when it starts up so we get a bit of an idea. Have you had any people come to you and go like, “I’ve tried to index my billionth record and it failed?”

We had someone try to index hundreds of millions of records and they were successfully able to do it but they were surprised by the memory consumption. Maybe we should make this clear in the Docks is that the performance of Typesense comes from us storing all the indices in memory. We use efficient data structures but the amount of memory that Typesense uses is directly proportional to the amount of data that you’re going to index and the number of fields you index inside every document.

That’s something especially for folks coming from Elasticsearch, this is a big surprise for them because Elasticsearch stores indices on disk with a RAM cache. Though the Typesense binary itself, when you start it up is probably 20MB without any data. It is super lightweight but when you index data, all of that data starts taking up RAM. Those are the kinds of issues we run into. The scale issues that we’ve heard of are from users that we work closely with and we’re able to mitigate on a one-on-one basis. We made it as part of Typesense Cloud because we have a lot of visibility into Typesense running in our own cloud environment, which has been a nice little benefit that we didn’t realize we would get when launching Typsense Cloud.

We’re now running Typesense in production ourselves. We’re able to gain so much visibility into the runtime performance characteristics that we’ve then taken that and applied it back into the core open-source product and then made it available to the community. The performance issues we’ve run into of people that maybe haven’t chosen the right configuration, and that’s not clear or let’s say, they ran into some bottlenecks.

Those are the things that we have been able to address and are actively addressing with different customers with Typesense Cloud. That’s been good that we have a direct line of communication with these folks to help improve the core product, and that makes its way back into the open-source product. It’s been good. From a scale perspective, we haven’t heard anyone running into surprises other than the memory consumption front, which by design, we store everything in RAM.

{{divider}}‍

Are you constantly fending off emails from venture capitalists asking to invest? It’s funny how Elastic, the company, probably comes up in some way, shape or form on pretty much every episode. Let’s not talk about Elastic with regards to the licensing or just in terms of how they do this job of hitting it out of the park in terms of this commercial open-source model. I’m wondering, with them and Algolia, it’s a hot sector and has been for a quite few years.

We’ve had more than a dozen venture capital firms reach out in the last few months. These are the top firms that reached out. The VC world seems to think that search is becoming a hot space again, especially after Elastic changes its licensing model. It seems there’s an increased spotlight in the VC community on search. We’re continuing to see inbound interest from VC firms. Personally for us, for the longest time, we were planning to stay bootstrapped. Now, we’re still bootstrapped we haven’t taken external funding. A friend who’s in the VC world that I’ve spoken to over time has slowly made me see that there can be a big upside. Raising money, being able to hire folks, and move fast is one thing. That unlocks a big capability for us.

I’ve become more open to the idea of entertaining external funds. What I’ve been essentially looking at is making sure that there is a right alignment between if and when we raise funds with whoever we’re raising funds from, that we’re going down the open-source model. There are a lot of implications that come with it. There’s a commercial layer on top, an open-source at its core, the ROI timeframes around that, our goal of making the search accessible to everyone, and all the implications that come with that.

I’ve been talking to VCs but from the perspective of trying to find if there are VCs out there who have a good alignment with the commercial open-source model, which is not too similar to many other business models out there. It’s a unique thing that folks need to have gone through and understand well the impact that it might have on their own investment strategy as a firm if they choose to go down to add commercial open-source companies into their mix. It’s been conversations around finding alignment.

We’re continuing to have those conversations but at the moment, we’re bootstrapped and we’re using revenue from other products that we’ve built over the years to fund ourselves. Were in a comfortable position. We do have a ton of work that we need to get done. We could definitely have more folks help us out on the project. There’s also a joy of being bootstrapped as well. We’re in a comfortable spot, having one-off conversations here and there with VCs.

{{divider}}‍

It’s interesting, we’re very similar. The project was a spin-off of a side project from the agency that I was running. We funded some other development through that agency. We haven’t raised any money. The other founders and the guys that work on the projects were like, “I don’t know what we do with a team of engineers.” I like that simplicity of focus of there are two of you. That’s nice because there’s never any like, “I don’t know what to work on.” It’s like, “This is the number one pressing problem that we need to solve or feature that we need to add.” When that one is done, you’re like, “This is obviously the next one.” If we had five people in it, I’m worried that we’re just doing stuff for the sake of it. It helps with your product direction is what I’m saying, that scarcity of resources is powerful.

Being a two-person team is a big competitive advantage that we have. If we need to build a feature as the folks that are also implementing it, can directly talk to the user that’s going to be using it, get feedback in real-time, implement it, and be done with it. There are no other dependencies between us and shipping those features into the product. That’s like a superpower. We don’t do elaborate roadmap planning. We intuitively know just based on users asking us. We know that in the next three months, here’s what we’re going to be working on. We have that in our minds. There are no elaborate meetings for roadmap alignment. Not bad things, but things you have to do once you have a large team that you need to get everyone aligned on working on similar goals.

Being small helps us focus on one thing at a time because we literally have no choice. Sometimes something big is taking too long, we do switch to something else and quickly see if we can fix that, then come back to something. It’s being able to also do something like that. For example, we had a feature planned for February 2021, GeoSearch. That’s something that people have been asking for but after working on it, we realized that there is performance implication to adding this feature so it’s going to take a while for us to figure out what the issue is. We paused that and we said, “Let’s pause on that. There are other pressing things that have come through in the meantime,” so we started working on it.

Having that flexibility to move things around and not being committed to things one way or the other, being able to respond to user feedback in real-time, and also reducing the communication overhead that typically larger teams start inheriting, all of that does not exist in a small team. That’s something that’s been such a nice time saver at this point in time. You can’t ever be a two-person team forever but at this stage, I’m enjoying all of the benefits that come with a small team.

{{divider}}‍

I always think if you want to climb up a tall mountain, there are two styles to approach it. There’s a siege style where you put 100 people at the bottom of the mountain, and then have teams leaving stuff at different camps, and siege-attack the mountain. There’s an alpine-style where you’re with one other guy with as little stuff as you can carry and try to do it ten times faster. I think about that analogy. As soon as you’ve got more than four engineers, you start hitting this cubic power of communication, Slack messages and stuff. It can be debilitating.

We hired an intern to help out with integrations. Having more people, I can see that already slowing us down. It’s good that we’re getting a lot of help and we’re able to push forward on multiple fronts. There are pros and cons, and it comes with the territory. One model that I’ve been interested in and I’m hoping that that’s the way we run Typesense as a project is how typical open-source projects run. Let’s say it’s a library that people use and someone is working on it as a side project and put it up online. Somehow open-source projects tend to survive like smaller libraries that you tend to use. These projects tend to survive even though there’s no roadmap planning per se. People are not getting on a phone call to discuss things or the typical things that come with running a larger team.

I’ve been trying to understand what makes those libraries still tick with the async distributed way of communicating. People don’t even know what time zones the different contributors are in, but it still thrives. I’m wondering, is there a way for us to adopt a similar model even when we hire full-time or maybe not even full-time? What if we hire folks for whatever capacity that they’re willing to spend time on the project in a distributed setting, regardless of where they are or what time zone they’re in. We look at it like any other open-source contribution in a distributed async. If we can even get away with it, not even have Slack conversations and keep everything on GitHub completely asynchronous and see if we’re able to pull that off. I’m trying that with our interns but interns are a different thing because there’s a level of hand-holding needed there. We’ve had to be on Slack and talk, etc. Especially for engineers who worked professionally before, I wonder if this is a model that can be potentially an interesting thing for a company to adopt as a way of working.

{{divider}}‍

It reminds me of the way Valve works. I don’t know if they still do work like this. People can work on whatever they want to work on. If you want to work on the physics engine for the game engine, you can go and do that. If you want to work on how a gun works in a certain game, you can go and do that. It’s interesting because you’re right, just because you’re paying someone to work with you, why would you then change the model from the one that you’re used to doing in an open-source way anyway?

Even what you mentioned about the way Valve works, I forgot the term for it but even Zappos had it at one point. They later switched out of it. If you think about how open-source projects work, typically, what open source maintainers would ask is that, instead of showing up with a PR with a bunch of changes and saying, “Can you merge this in?” Usually, it starts out with the conversation saying, “I had this idea. I think it would be interesting if we did it this way. What do you think?” The community then evaluate it. Maybe there’s another sponsor for it who would partner with the folks who are proposing it.

There is still an element of, for example, an RFC process that some of the larger open-source projects adopt. There’s still a level of processes involved, but even that process can be an asynchronous distributed process as well that we can adopt as a company. We attempted to do this at one of the previous companies I used to work at Dollar Shave Club. Internally, we used to have an RFC process where if an engineer on the team wants to make a change to the way we do things or change the architecture, we could write up an RFC and then people would comment on it.

We had an internal private repository where people would make pull request RFCs. We were inspired by the Ember model. At the time, Ember has this RFC process using pull requests on a GitHub repository. We were trying to follow that internally. I’m wondering if that can be a standard way open-source companies work, regardless of whether there’s a company behind it or not. The process is still community involved in getting everyone on board but still in the asynchronous distributed way.

{{divider}}‍

Jason, we’re coming to the end of our time. Have you got any people, features, customers or memes that you want to give a shout-out to?

I want to give a shout-out to three of our contributors. Two of them. Yuri, who contributed to original TypeScript definitions and then Nick, who’s now taking it forward. He has also offered to convert our whole JavaScript library into a TypeScript code base, which I’m excited about. Art, who has contributed some significant improvements to our Gatsby plugin. If people could index multiple entities from their Gatsby pages into Typesense. Thank you so much for these contributions. We’re also looking for more contributors.

We’re not experts in every single language and framework out there. People have interestingly asked us for a Rust client. If folks are familiar with Rust, I’d love to have a Rust library. People have asked us for a Dark package as well. Someone from the community said they’d be interested in working on it. I can part put you in touch with them. People have asked us for WordPress integration, which we’re working on. Integrations with other frameworks like Rails, Django or hooks into the ORMs. These are things that people generally have gotten used to coming from Algolia. If anyone has worked with these, I’d love to partner. We use Rails for Typesense Cloud so I should probably build our own integration but I appreciate contributions.

{{divider}}‍

Well done for bootstrapping your business as well. I’ve got a load of respect for that. We could have talked more about the benefits of it because it helps me sleep and stay sane.

This is a great conversation. Thank you. I enjoyed talking and thinking through the different things that you asked me. That was fun. Thank you for having me.

{{divider}}‍

Stay safe. We could catch up in a couple of years. Take care. See you.

About

Jason Bosco

Technology Leader and Entrepreneur with experience in building, nurturing and scaling Product, Engineering and Design teams to build delightful technology products. Also a hands-on Software Engineer, Generalist and Polyglot with experience in architecting & building Scalable and Highly Available web-based systems. Currently Co-founder at Typesense, an open source typo-tolerant search engine.

Available for talk, coaching and workshops on:

Links from the Episode

No items found.

Success!

We'll keep you up to date with the latest Flagsmith news.

Must be a valid email

Typesense Interview | The Craft of Open Source

Episode Transcript

Links from the Episode

Subscribe