The Dynamic Duo: Apache Druid and Kubernetes

May 16, 2023
Reena Leone
 

Kubernetes, an open-source container orchestration platform, has been making waves in the Apache Druid community. It makes sense – using Druid with Kubernetes can help you build a more scalable, flexible, and resilient data analytics infrastructure. Yoav Nordmann, a Tech Lead and Architect at Tikal, shares his experience with Apache Druid and Kubernetes. We will dive into Druid’s capabilities, including real-time and batch ingestion features for large volumes of data. We will also explore the benefits of using Kubernetes with Druid, such as the scalability, flexibility, and resilience required for building a data analytics infrastructure. 

Listen to the episode to learn:

  • How Kubernetes benefits the deployment of Druid
  • How to overcome challenges with Druid
  • How using Kubernetes can help solve resource usage issues

Learn more:

About the guest

Yoav Nordmann is a Backend Tech Lead and Architect for Distributed Systems and Data. He is passionate about new technologies, knowledge sharing and open source.

Transcript

[00:00:00.490] – Reena

Welcome to Tales at Scale, a podcast that cracks open the world of analytics projects. I’m your host Reena from Imply, and I’m here to bring you stories from developers doing cool things with Apache Druid, real time data and analytics, but way beyond your basic BI. I’m talking about analytics applications that are taking data and insights to a whole new level. Today we’re tackling a popular topic in the Druid community lately, Druid and Kubernetes. I mean, it makes sense since using Apache Druid with Kubernetes can help you build a more scalable, flexible, and resilient data analytics infrastructure and then you can easily adapt that to your changing business needs and of course different data volumes. With that, I’d like to welcome to the show today’s guest, Yoa Nordmann, tech lead and architect at Tikal. Yoav, welcome to the show.

[00:00:46.160] – Yoav Nordmann

Thank you, great to be here.

[00:00:47.760] – Reena

So I know that you are a technology enthusiast and that you’re passionate about open source. So let’s talk a little bit about you and your journey. Tell me a little bit about how you got to where you are today.

[00:00:58.630] – Yoav Nordmann

So, I’m actually 25 years doing this, I started as a software engineer somewhere 25 years ago in the C++ area, embedded real time, doing really low level stuff. But soon after that, two years later, I found myself working a lot with Java and web services, stuff like that. And from there it just went on doing a lot of Java, working with a lot of libraries, different kind of applications up to today, where I work as a tech lead and architect, different languages, different areas, distributed programming, microservices and data. So yeah, it’s been a long but fun and entertaining journey.

[00:01:42.130] – Reena

Can you tell me a little bit more about where you’re at now at Tikal?

[00:01:45.400] – Yoav Nordmann

Sure. So Tikal is a company here in Israel, a professional services company. We actually don’t have a product, although we do have a product. The product of Tikal is us. We’re about 120 people. There are no managers or anything like that in Tikal. And we go to our clients every day and we help them build whatever they need building. Usually we come and we define ourselves by having the knowledge of the latest and best tech and the best practices and everything. So we go to clients, we work in their teams as a team member or as a tech lead, and we enrich them with our knowledge and we enrich ourselves with some of their knowledge. So we do this whole thing of a symbiosis, basically, and it’s a lot of fun. I get to work with different clients, I get to change clients every year, which I did before Tikal anyhow. I was never really able to stay at a place too long. I got bored really fast. So now I take all every year I switch clients and get to experience every time different things.

[00:02:53.450] – Reena

I bet that gives you an opportunity to try out all kinds of different technologies depending on what you’re trying to build.

[00:03:00.220] – Yoav Nordmann

Yes, definitely. There was even once this company where I was allowed to do a POC on 20 plus databases.

[00:03:08.000] – Reena

Wow!

[00:03:08.670] – Yoav Nordmann

That was really fun. Yeah. Tikal is all about the best and latest technology we even have every year we have the Israeli technology radar, just like ThoughtWorks. So we’re doing it in Israel together with other companies and all of these new and emerging technologies, we have to learn them so we can provide them and the knowledge to our clients. So, yes, we’re doing a lot of POCs. We are playing around with a lot of stuff, and it’s a lot of fun. Usually we don’t usually do a lot of working on bugs or anything because we’re definitely at the edge. It’s great being there.

[00:03:51.870] – Reena

Now, I know that you are an Apache Druid user because you gave an awesome presentation on Kudras, which is Kubernetes Druid auto scaler for Maximum Resource utilization. That was your presentation. Oh, utilization and speed. How could I forget? Speed. But before we kind of dive into that, can you tell me a little bit more about your work with Druid and how long you’ve been a Druid user?

[00:04:16.450] – Yoav Nordmann

Sure. So I went to this client, and the team leader there, he just said, well, we have this great database which we are using. It’s client facing, and it’s an OLAP database. And we really need someone, a tech lead. We need someone to get into it and understand everything and help us out. And that database was Apache Druid. That was something like five, six years ago already. I think I started working on Druid version 17, maybe. Today we are at version 23, if I’m not mistaken.

[00:04:52.070] – Reena

Oh, we’re going up to 26, and whenever this comes out soon, we’re almost there.

[00:04:57.700] – Yoav Nordmann

Oh, my God, I’m so out of touch. Sorry.

[00:05:00.640] – Reena

Don’t worry, I will be covering that.

[00:05:03.490] – Yoav Nordmann

Thank you. So I started learning Druid six years ago. I just read through all of the documentation. I thought documentation was really great. And then they told me that I need actually, since it’s user facing, they really need to know what’s going on inside. There were some queries which got stuck, and it didn’t really know what was happening. So I actually had to extract all the metrics which Druid is exporting using Kafka, and I ingested it back into Druid so for monitoring purposes. And I used allegro/turnilo which is an open source front end for Druid, to view those analytics. And it was great. It took me a couple of months. And after that, we did a lot of stuff with ingestion, real time ingestion, batch ingestion. We switched to begin with. We started we had batch ingestion using EMR, and then we started using the parallel ingestion of Druid itself. And we’re talking about a huge scale. They’re now pushing something like four terabytes of data. It’s a huge Druid cluster. We had nine terabytes. Did I say four terabytes or four petabytes?

[00:06:20.640] – Reena

Terabytes. But petabytes!

[00:06:21.490] – Yoav Nordmann

Petabytes, yes. They’re pushing four petabytes of data now, because every day we add something like nine terabytes of new data to ingest into Druid. So that was a lot of fun, a lot of playing around with internals. Yeah, it was great.

[00:06:37.760] – Reena

I feel like when we talk about data at that scale, my brain is just trying to add zeros to figure out how much that is. And that’s what Druid is great for, is for handling petabytes of data. But just trying to think about that scale sometimes is just my brain is blown.

[00:06:55.250] – Yoav Nordmann

Yeah, it was quite funny when the team lead there, he told me, okay, I asked him, how many new events do we have each day? And he says, something like 53 million. And I said okay. Sure. So we made the calculation. I said, well, this doesn’t seem right, because we used to have something like a million events per second. And so I did the math, and it was like 53 billion, you mean? And he said, oh yeah, sorry, I forgot. I just forgot a couple of zeros. Like, oh, great. Little bit of a difference.

[00:07:29.840] – Reena

Yeah, we see that with like Netflix is a big OS Druid user, and you can imagine how many events are created running like a streaming platform that size. I know that, speaking of Druid, you have a unique perspective because you don’t just work with Druid, but whatever database suits the needs of your clients, how do you go about choosing the right OLAP database?

[00:07:57.130] – Yoav Nordmann

Right, so that’s actually quite interesting because you never know what exactly the client needs. And I’ve done some POCs on some really crazy databases. There was this one Chinese database, documentation was in Chinese, it wasn’t that bad, but we couldn’t really use it. So, first of all, you really have to understand the needs of the client. And then you go to the Cap theorem and you look where you can put what’s the best for them, either consistency or availability or partition tolerance. You can always only have two. So then once you know exactly what he means in terms of the Cap theorem, that’s when you go and you look at all the databases which are on this segment of the Cap theorem, and then afterwards you just try and figure out in the documentation which database would be best and what exactly he needs. And the support, for instance, the current client, he has a big need for upserts, actually. And the OLAP databases, they’re not good with upserts, so we had to do this really intricate system of how we are doing upserts half a year ago, for instance. And also, I think it’s something like three and a half terabytes of data.

[00:09:23.350] – Yoav Nordmann

So again, it’ll take some time, but we do have a system in place to do that. It’s just every time trying to figure out which database will give me the least complications down the road, I guess. I mean, most OLAP databases, they have a great really low latency for the query return time and still there are subtle differences which make all the difference in the big picture. So it’s every time a POC, Druid is a big name in that industry anyhow, and it does support most use cases. So unless there is really something which Druid doesn’t have and they really need, and I would definitely go to Druid for most cases because it’s a proven technology.

[00:10:14.210] – Reena

Yeah, I have a little bit of a biased opinion there, but I know right now there’s been a lot of chatter around. I guess the kind of the three sharing of space are Druid, Pinot, and then Clickhouse are kind of like all sort of fighting over the same territory.

[00:10:31.430] – Yoav Nordmann

Yeah, actually there are much more. You have Apache Kylin. There is also Rockset. They’re proprietary, so I know a [yaht po?] tried out. I don’t know whether they actually ever used it in production, but they tried Rockset. There is a Tree Rock, if I’m not mistaken, or something like that. So there are more and more. And now with Duck DB, which is a little bit different, but it’s also an all app database, but it’s much less data. So I think this type, the whole real time analytics industry is just booming at the moment. So definitely Clickhouse, Pinot is way up there as well. And Druid.

[00:11:13.210] – Reena

Yes, well, when you talk about companies needing the ability to deal with millions and billions of events, it makes sense because a lot of that is real time data coming in. And with streaming continuing to grow, I don’t think we’re going to see a shortage. If anything, I think we’re going to see those groups continue to grow and new players enter, like with any hot technology. But speaking of hot technology, I was excited to have you on the show today because something that keeps coming up in the community is Druid and Kubernetes. Very hot topic. I know Kubernetes is great for deploying and managing containerized applications at scale and handling underlying infrastructure so devs can build their applications. Then it makes sense then that they would use Druid, which is designed for real time analytics applications. But are there any other reasons that you think that we’re seeing a lot of talk around Kubernetes in the Druid community?

[00:12:06.160] – Yoav Nordmann

There is this one aspect of Druid, the ingestion aspect, where sometimes using batch ingestions, you would have these peaks of batches coming in and then you need a lot of horsepower to do the ingestion. Then afterwards there is nothing running. So let’s say I have a Druid cluster running on 20 machines. Those 20 machines which will do the ingestion, for instance, they would be most of the time idle and that would cost me a lot of money. Now if I’m using Kubernetes, then I would be able to scale up a couple of machines whenever I need it. So that’s actually something which I wrote, an external service, which would identify whenever we need more machines to do the ingestion, it would spin up some more machines, and then it would kill them after that. So that’s actually the biggest aspect of using Kubernetes for Druid, the automatic addition of more horsepower. It’s less of an issue for the data nodes of Druid. That’s where the data is stored, because those data nodes, each one of them holds a lot of data on their hard disk. Loading or creating another one of those machines takes some time, because that’s when it has to copy a lot of data from the deep storage S3 or Azure Blob storage or whatever, and it has to copy that onto the new node hard disk.

[00:13:37.200] – Yoav Nordmann

So that takes a lot of time. So it doesn’t make sense there for ingestion purposes, it fits like a glove.

[00:13:45.370] – Reena

Are there any challenges that need to be solved, kind of, with merging the two? I know we get a lot of questions, and it seems to be kind of like almost like a newer thing.

[00:13:53.290] – Yoav Nordmann

So going with Kubernetes, first of all, you have to know all the nits and tricks of Kubernetes, first of all. So there’s an overhead, right? You have to understand that there’s a lot of data ops. So you need also have to have a really great DevOps guy to work with you hand in hand. When you’re setting this up, it takes some time. And afterwards, while you do have some funky stuff going on, sometimes in Kubernetes, sometimes network is not 100%. This could happen also without Kubernetes, but it’s just another layer which you have to take care of. There was this one really fun time, I think it’s around Christmas, when you just couldn’t get any more machines on Amazon. So even if you want to auto scale your machines, you just can’t get any more machines. So, say, for instance, you’re losing a machine during that time, for whatever reason, and someone else grabs that machine, you’re left for a couple of hours without a machine. That’s unfortunate. So during that time, you have to actually tackle this a little bit different. Other than that, I’m running even my own stuff on my laptop on Kubernetes.

[00:15:10.590] – Yoav Nordmann

I’m using Rancher K3S to run Kubernetes on my laptop. And even when I’m trying some stuff out on Druid on my laptop, it’s on Kubernetes. Actually. I had Apache Pinot also running on Kubernetes on my laptop. I’m raising all of these systems on Kubernetes. It’s just great. I think they’re really made for each other.

[00:15:33.470] – Reena

You also built a Kubernetes auto scaler. Can you tell me a little more about that?

[00:15:38.510] – Yoav Nordmann

Yeah,

[00:15:39.340] – Reena

Because it’s kind of an interesting story of how that works out.

[00:15:43.230] – Yoav Nordmann

The ending is really great. Yeah, a bit sad, but anyhow. So at this client, the one with the large scale of 1 million events per second, so we actually had all the time. We had peaks of batch ingestions, like, as I said, nine terabytes of new data. And so we needed a lot of machines at certain peaks, and then we just didn’t need anything. So what we did is actually we said the data nodes. Okay, so we’ll have the historical nodes separate, and the middle managers, those are the peon tasks, the tasks themselves which do the ingestion, we’ll have them separate. And whenever we have a new ingestion coming in, I would check every minute if there is a task pending. And whenever there would be a task pending, I would extract the exact memory size, configuration which was needed for anything, and I would spin up the exact amount of nodes which were needed to finish up this task. So I would create 100 little machines, just each one doing this one task, and then I would tear them down. So it was actually quite fast, very efficient, very low cost. And we almost made it to production.

[00:17:09.360] – Yoav Nordmann

We had a couple of problems here and there. Of course, it’s not that I wrote any bugs. I never write any bugs.

[00:17:15.300] – Reena

Oh, of course not. Never.

[00:17:16.640] – Yoav Nordmann

Yeah. No, definitely. So we almost made it to production. What happened was that I still don’t know who, but this horizontal auto scaling is now embedded into Druid. So you can have this extension. You can install this extension into Druid and then just use it from within. My service was an external service. Right? So someone killed my KUDAS. My Kubernetes Druid. Auto Scaler. They just killed it.

[00:17:50.420] – Reena

Who did it? Who did it? If you’re listening right now, who did it? Raise your hand.

[00:17:55.350] – Yoav Nordmann

Yeah, so I didn’t even know about it. The team lead back there, or the manager at that company just sent me an email, said, KUDRAS is dead. I said what? I spent three months on that. And then he sent me the link. But it’s nice. It’s nice to see that something which I thought or it wasn’t my idea, by the way, I just implemented it. And it’s nice to see that whatever you’re doing that someone else also is thinking about solving the same problem. And I have to say they did a better job. I mean, it’s integrated into Druid as a Druid extension. Yeah, it’s definitely better.

[00:18:35.460] – Reena

You’ve also worked on a POC with Imply, which is my company, and you were able to solve a challenge that you ran into with Druid. Can we talk about that? I heard that some queries were not returning at the sub second speeds that you expected them to.

[00:18:53.640] – Yoav Nordmann

Yeah, that’s true. So, first of all, I’m actually in two POCs with Imply at two different companies, two different clients of mine. So at this one company, they came back as we loaded all of this data, three and a half terabytes of data and the goal was a subsecond query latency response time, which we did. And then afterwards they came back with a really odd query which took them 8 seconds to run. So the guys at Imply, Anil and Muthu, they said okay, we have to run flame graphs to see what is going on. And I actually wrote how to run simultaneously flame graphs on Kubernetes and everything. I wrote a whole medium post on it, but I said okay, let me just first try and dissect this SQL and try to understand what they’re trying to do. So Druid as an OLAP database is actually very much created for aggregations, right, analytics and aggregations. And I know for a fact that aggregational queries run much faster than simple select queries or scan queries as they’re referred to in. So this was a scan query. So what I did to begin with, I just wrote it as a simple aggregational query and I went down from 8 seconds to 3 seconds immediately, right now Caching is off in this Druid cluster, so I know that I was doing something right.

[00:20:29.360] – Yoav Nordmann

And then afterwards I was looking that they expected they did some sort of ordering and I have no idea why they wanted it, they didn’t need it. So of course I removed that and I did some more changes, I forgot which ones. And then I went back and got this to be something in the range of 1.1 or 1.2 seconds, which is totally fine coming down from 8 seconds. So in the end, we didn’t need to run the flame graphs, it was just understanding the database and using its power to solve to write the queries the way they were designed to run. And even the guys at Imply, they looked at me and said oh wow, great, that was a great job.

[00:21:15.210] – Reena

Well, I bring it up not to highlight issues with Druid, but to showcase solutions. Because inevitably when you’re dealing with open source technology or even if you’re working with a company like Imply you’re going to run into some challenges and it depends on what you’re using it for, what kind of queries you’re running, how much data you’re putting in. And inevitably, like sometimes there’s an anomaly. Like this weird query that was just taking 8 seconds.

[00:21:42.650] – Yoav Nordmann

Yeah, that is true, but actually Apache Druid has something very unique in that aspect and that is Clarity. So Apache Druid has another Apache Druid, which is called Clarity for monitoring purposes. So the guys that Imply, I actually implemented some of it on my own at this other company, but Clarity has way more functionality and the guys that Imply they just did a demo and you can see how you can actually look at each query and only using that tool. Just try and understand and see where exactly the problem lies and you can see whether the segment files are too big, too many, too many columns, too many rows, you have to partition them differently, the CPU time. So there is a lot of information there just using simple Graphana for different databases. It’s just not the same. So this is very unique to Druid and that’s why I like it so much. You can just see the internals in front of your eyes and then just pinpoint to the problem.

[00:22:49.720] – Reena

Well, that’s not the only time that you have helped work through an issue. You were also talking about, and I believe you did a blog post on this recently about near real time backdated data updates. Backdated data updates is a fun thing to say and I feel like maybe again, I’m focusing on challenges, but I think it’s important to figure out and talk through how you solve these challenges. Can we talk a little bit about that one? Especially since it deals with upserts and you brought them up and that’s another popular topic these days.

[00:23:21.910] – Yoav Nordmann

Yeah, definitely. So those upserts in columnar databases, they’re really tough, dating back to Parquet files and Delta Lake and whatever. So the issue is, following that, we knew that we were going to get back data updates, we just didn’t know how far back. So we started saying, okay, let’s just have every day we’re going to have 20 million new rows. That’s not a lot. And every 2 hours we’re going to get backdated data updates. So we said, okay, let’s say we have two weeks of backdated updates. So we would save the past two weeks as daily partitions and we could just reingest every time we’re going to get an update, we could reingest those days. Ingesting 20 million rows takes about three and a half minutes. So that didn’t seem feasible because we would have to update 20 days, sorry, 14 days. And it would take three and a half minutes and parallelizing that, it’s just too much. Three and a half minutes of updates, that’s not good enough. So there is this way to add new data to Druid, just add it on, append it, and then using the latest keyword. So I would have to in the same table, I would have to have a new column with a timestamp and then I would just get always the latest data point.

[00:25:02.770] – Yoav Nordmann

So talking to the guys that Imply they really didn’t like that too much because then it sent two queries at once just to solve that problem. So what we did in the end is we did a hybrid approach. We’re going to save the last 14 days, we’re going to save them as daily partitions. Four weeks before that, we’re going to save as weekly partitions and everything before as monthly partitions. And the backdated data updates are going to ingest them as updates, meaning that from now on we would have to use only aggregational queries, which is totally fine. It’s even better because as I said before, Druid was made for aggregational queries. So using this method, we’re going to just ingest the updates. It’s going to be really fast. Every 2 hours, 20,000 rows, there’s like a minute and a half for the update. And in the background, we’re just going to do re ingestions all the time just to compact the segment files back and to have, again, a low latency query response time. So it’s very complex, but it’s all doable. We’re using Airflow to do the whole thing. And I think I just finished the implementation of this, and it looks quite nice.

[00:26:26.600] – Reena

And you used Airflow when you built KUDRAS, right?

[00:26:29.250] – Yoav Nordmann

No, actually we did use Airflow there, but KUDRAS was a service which was standalone. It didn’t even need a database because it just used to work straight with Druid and also with not helm charts, but with Kubernetes, seeing what’s going on with everything, so it could fall and come back up. It was just stateless. It was quite nice.

[00:26:55.040] – Reena

I have to say, we’ve been talking a lot about POCs and projects that you’ve worked on in the past that you’ve talked about, but what are you working on right now? Are you doing anything cool right now?

[00:27:04.630] – Yoav Nordmann

Well, not too cool, I have to say.

[00:27:07.930] – Reena

All of this is cool. Yoav, all this is cool.

[00:27:11.530] – Yoav Nordmann

Yeah. No, I’m having a lot of fun because I’m writing actually, at the moment, I’m writing, so at the current client, we’re actually extracting data from Snowflake and ingesting it to Druid, because Snowflake doesn’t do clustering the way we need it, or it could, but it would be way more expensive. And it’s just a little bit slower than Druid and for a couple of reasons. So now I’m building an infrastructure on Airflow so that each Dag, each new dag, each new table which needs to be copied is just one line. And I finished that today, and it’s working. And so it’s a lot of fun to see your stuff working. And it’s going to be used soon enough. I’m working with another team at that client. They need the data really fast. So that’s one issue we’re doing. We’re also running we need to set up a new environment. So we’re doing the whole helm charge, Druid, Kafka Airflow thing, stuff which I did in the past already. But it’s always fun again.

[00:28:14.750] – Reena

Well, it’s always fun when you see something work that you’ve worked so hard on and you’re like, all right, I’m done it’s live. It’s going,

[00:28:20.180] – Yoav Nordmann

It’s working, and people are using it. That’s really a great satisfaction.

[00:28:26.630] – Reena

Well, Yoav, I’m so glad that you joined me on the show today. This was amazing. I think, actually, I’ll have to have you back and keep track of what you’re working on. Especially I found that whole Snowflake to Druid instance very interesting. So I think I may have to do another episode. So that’s going to do it for us today. Thank you again for joining me.

[00:28:47.070] – Yoav Nordmann

You’re welcome. It was great. Thank you.

[00:28:49.440] – Reena

If you want to learn more about things that we talked about today, like Apache druid, you can visit druid.apache.org. And if you want to learn more about my company, Imply, please visit imply.io. Until next time, keep it real.

Let us help with your analytics apps

Request a Demo