APIs, Analytics, and Apache Druid: How Kong Delivers API Observability and Insights with Hiroshi Fukada
On this episode we’re talking APIs and analytics with Kong, a lightweight, fast, and flexible cloud-native API gateway that’s using Apache Druid to power analytics for their platform, Kong Konnect. We’re joined by Hiroshi Fukada, Staff Software Engineer at Kong who talks us through how his team manages customer data through the Kong Gateway and into Kong Konnect, enabling users to view and analyze data.
We’ll dive into their use of Apache Druid for real-time data processing and analytics and the transition from on-premises solutions to using Imply’s managed Druid services to avoid the complexities of infrastructure management. Listen to hear about the features and benefits of Druid, including low latency and ease of use, stay for Kong’s contributions to the open-source community, such as the DDSketch extension for better handling of long-tail distributions!
Listen to learn more about:
- How Kong Konnect manages API services, enabling developers to search, authenticate, and analyze APIs effectively
- The integration of Apache Druid for real-time data analytics within Kong Konnect, highlighting features like low latency data processing, SQL and native query languages
- Kong’s future plans for Druid, including improved data compaction and querying from deep storage
Learn More
- Community Spotlight: Using Netflix’s Spectator Histogram and Kong’s DDSketch in Apache Druid for Advanced Statistical Analysis
- Introducing Apache Druid® 30.0
- Query from deep storage: Introducing a new performance tier in Apache Druid
About the Author
Hiroshi Fukada is a staff software engineer at Kong. He is a seasoned software engineer with almost a decade of experience in data analytics and real-time data processing. He began his career at DeepField, where he focused on outlier and DDoS detection using data from various networks, contributing to the startup’s success before its acquisition by Nokia. He then joined the Toyota Research Institute’s Autonomous Driving division, where he managed data labeling and cataloging for sensor data from autonomous vehicles, developing innovative methods for efficient data search and categorization.
Outside of his professional work, he’s a fan of the multiplayer online battle arena video game Dota 2, the sequel to Defense of the Ancients, a community-created mod for Blizzard Entertainment’s Warcraft III: Reign of Chaos.
Transcript
[00:00:00.000] – Reena Leone
Welcome to Tales at Scale, a podcast that cracks open the world of analytics projects. I’m your host, Reena from Imply, and I’m here to bring you stories from developers doing cool things with Apache Druid, real-time data and analytics, but way beyond your basic BI. We’re talking about analytics applications that are taking data and insights to a whole new level. We all know that APIs are fundamental to modern software development. That’s where Kong comes in. They provide the foundation that enables any company to become API-first. And the numbers don’t lie. At the time of this podcast, Kong is seeing over 400 billion API calls a day in a community of over 160,000 and counting. Built on the world’s most adopted API gateway, Kong’s unified cloud API platform delivers fast, reliable and secure digital experiences so developers can focus on other/better things like performance at scale. Joining me to talk about what Kong is working on is Hiroshi Fukada, Staff Software Engineer at Kong. Hiroshi, welcome to the show.
[00:00:59.660] – Hiroshi Fukada
Hi. Thanks for having me.
[00:01:02.320] – Reena Leone
Okay, so I like to start every episode with a little bit about my guest’s background. So tell me how you got to where you are today.
[00:01:09.120] – Hiroshi Fukada
Yeah. So my first dip into the data world was working for a startup that was used to be called DeepField. They got acquired by Nokia, but they did analytics and information around data monitoring around routers, net flow, Internet traffic, basically. We’d use collected data from different people, including CDNs, transit networks, even subscriber networks as well. Using that, we built products that were meant to do outlier detection. So if a router went down, you’d see net flow go down. We also did some stuff around DDoS detection, and there’s certain characteristics that you can see inside of a data stream that you can try to detect before it happens or before DDoS goes out of control to stop it right in its path.
[00:02:01.900] – Hiroshi Fukada
After that, I went to do a little bit of work over at Toyota Research Institute for their Autonomous Driving division and handled a lot of their data labeling, data cataloging, data searching, all that stuff for their enormous amount of data that they collect on these drives. There’s like 20 cars, and they all have 50 billion sensors on them. So categorizing and being able to search all that data is quite a problem as well.
[00:02:30.710] – Reena Leone
So you’re like a data guy, like through and through.
[00:02:33.080] – Hiroshi Fukada
Yeah, I started data probably like, I don’t know, seven, eight years ago, something like that. So I have been around for a while now, I guess, and I’ve seen some crazy stuff.
[00:02:45.900] – Reena Leone
So at Kong, what team or department are you part of?
[00:02:49.760] – Hiroshi Fukada
I am part of the data team at Kong, and we’re mainly in charge of getting customer data up through the gateway and into our platform, [Kong] Konnect, which customers can use to view and analyze the data that we collect.
[00:03:06.210] – Reena Leone
Oh, awesome. And so can you, in your own words, tell me a little bit more about what Kong does?
[00:03:13.290] – Hiroshi Fukada
Yeah, Kong. Well, let’s separate the two different things. I’ll talk about Kong first and then what Konnect is. So Kong is the plumbing for all your APIs. You can have different APIs being served on different clusters, different servers, even different upstreams that aren’t even yours. Kong allows all your services to talk to each other in a unified way. All the authentication, all the authorization is handled through these pipes. It makes it super secure and it’s super, super fast. We’re the fastest API gateway out there today, and we are constantly trying to improve our features and our performance. Separately…
[00:03:51.320] – Reena Leone
Tell me about Kong Konnect.
[00:03:53.490] – Hiroshi Fukada
Yeah, separately, Kong Konnect is the one-stop shop to manage all your plumbing. As companies evolve and have more and more APIs that they want to support. They have different needs. They’re not just about connecting services together. It’s also about enabling others to find your services. So Konnect also allows you to have a catalog of all the things that your services are doing. And Konnect allows different developers or customers to come in and search what you have. Nobody wants to write the same service twice. If somebody wrote a currency converter, you’re going to reuse that. And Kong Konnect allows people to search for those services that exist. And of course, stuff like authentication and authorization are handled all between all of that as well, and Connect helps you manage that. On top of that, we have analytics that comes in, and this is my team’s mojo. It’s basically running everyone’s Grafana and Datadog all at once. And so people should be able to come into Konnect. If something’s going on, they should be able to come into the platform, search what’s going on, We have API request logging, we have analytics. You can get per service, per consumer, any graph that a customer may want to be able to tell a story about what’s going on in their network, who’s doing what, and what’s happening.
[00:05:14.490] – Hiroshi Fukada
So that’s where we’re at today for the data analytics and Konnect. We’re looking to also… It also provides a nice federated management across all your APIs as well. So from high on high, you may have decrees that certain services be protected in certain ways. And so Konnect allows you to come in and say that I will demand that if you’re going to publish an API, that you should have these types of security features set and these types of criteria need to be met. And so Konnect as a platform provides a federated way to manage all your APIs.
[00:05:56.460] – Reena Leone
And I believe, correct me if I’m wrong, the analytics portion of Konnect is where Druid is, right?
[00:06:02.050] – Hiroshi Fukada
Yes, that is exactly right.
[00:06:03.570] – Reena Leone
Okay, perfect. So were you familiar with Apache Druid before working at Kong? Had you used it before?
[00:06:08.870] – Hiroshi Fukada
Yeah, I had played with it a little bit. Back in Toyota Research Institute, we had a ton of driving logs to catalog, and they were all muddled around different places. And I had toyed with the idea of storing all these old car logs inside of Druid, which would be super nice if we could use it for searching for certain high acceleration, deceleration events, looking for certain objects that do and don’t exist, model confidence, that thing. I didn’t get to finish that project, but I did get to see how powerful Druid was, and it certainly piqued my interest because it was doing a lot of smart things. And as someone who did a lot of dumb things in the previous job, Druid seemed to solve a lot of those issues for me.
[00:07:00.680] – Reena Leone
So going back to Kong Konnect and the analytics portion, were you a part of the process, like searching for a new database, or was there another database before Druid that was being implemented there?
[00:07:15.270] – Hiroshi Fukada
So previously, Kong used to be an on-premises product, in which case everybody used to run their own Postgres instances, or something else.
[00:07:28.820] – Hiroshi Fukada
We support many, many databases, but all the analytics would also just flow to the same database, which doesn’t quite work. And so when we started firing up Konnect, it was immediately apparent that we shouldn’t be running a million instances of Postgres. So we were looking for something else to handle all that. And we weren’t really interested in running a Prometheus for everyone either. And so the two front runners I remember when we were just deciding where we should go was Apache Druid and [Apache] Pinot. I had come with a little bit of experience with Druid, and I felt like we could get off the ground fairly quickly, so we went with Druid.
[00:08:06.840] – Reena Leone
Awesome. Okay, since you know Druid, what were some of the features that you needed for this that Druid was good for?
[00:08:12.860] – Hiroshi Fukada
So we knew that we needed to be low latency from event to be able to display and query. So Druid’s real-time data processing natively supported, [Amazon] Kinesis and [Apache] Kafka were all super important important to me. The SQL and native query languages are things that everybody can pick up. I know SQL is super useful and super universal, but it’s not super programmer friendly to a degree. And so we actually use the native query language almost everywhere when we write our code. So it’s very nice. We were able to get up and running fairly quickly, and it was super easy to tune to what we needed it to be.
[00:08:57.070] – Reena Leone
Not to go off on a tangent, but why do you feel like SQL is not always super friendly?
[00:08:59.970] – Hiroshi Fukada
Because managing writing code to write another piece of code is always just difficult if you’re going to try to cover all the generic cases.
[00:09:11.370] – Hiroshi Fukada
That’s why there’s so many ORMs and different query layers, different libraries and stuff that exist to do that for you because it’s hard. It’s legitimately hard. And so we didn’t want to commit to using an ORM or a query writer because we didn’t really know where we were going. So we just use the JSON query language that is provided. And it’s honestly pretty powerful and it gets you a long way. So we are thankful for that.
[00:09:39.590] – Reena Leone
Can you walk me through where Druid sits in your data architecture and what your data stack looks like?
[00:09:44.990] – Hiroshi Fukada
Yeah, I’ll start at the origin. So Kong Gateway sits on our customers’ networks. A request happens, it proxies it. It creates a data point with different attributes, and then it gets sent up to ingest, which is K ingest, which we call King. King sips it up and then immediately puts it on a Kafka topic, and then a Flink job reads that message, enhances it further, enforcing tenant ID and other enhancements that we want to put on. And then it puts it back on another topic, in which case, Druid picks it up and ingests it for us in a few different shapes. From there, we query those different shapes in our API and then expose that through our Explorer API, which is visible inside of Konnect today.
[00:10:37.080] – Reena Leone
Did you say it was called King, so it’s King Kong?
[00:10:40.040] – Hiroshi Fukada
Yeah, it’s basically King Kong.
[00:10:42.060] – Reena Leone
That’s amazing. I don’t know if you have this off the top of your head, but can you tell me ballpark, how much data you’re ingesting on a daily basis? And with that, what are you seeing in terms of query speeds?
[00:10:58.570] – Hiroshi Fukada
So I think we’re on the order of 5 million, something, five, six million events a day multiplied out because we ingest in two or three different shapes. So on the order of 15, 20 million unique events ingested and queryable. The latencies that we see from event to being able to be displayed in the UI highly depend on people’s networks. But on the low end, it’s a couple of seconds. On the upper end, you might hit, depending on people’s upload frequency, it could be a minute. But we tried to keep that default number fairly low so that people can see their data basically immediately within 10 seconds. I would say the median is probably around 10 seconds for the first interaction because we try to be really optimal about when to send up.
[00:11:59.280] – Reena Leone
So you chose the path of using Druid with Imply. Why go with a vendor versus open source? Since you know both.
[00:12:08.300] – Hiroshi Fukada
Yeah, actually got off the ground using the Druid Helm chart. Actually help maintain it for a bit.
[00:12:16.830] – Reena Leone
We’ll get to that. We’ll get to your contributions.
[00:12:19.050] – Hiroshi Fukada
That was our proof of ground, proof that we could make it work for us. And then we said, let’s go to production with it. Let’s not use a Helm chart. We’ll use Imply. And one of the big reasons is because managing infrastructure is a lot of work. A lot of people don’t give it the credit it deserves. There’s a reason why SRE teams exist in every single company, and we didn’t want to become an SRE team. A lot of the things that are complex about Druid are like, Oh, what version are you guys running? I was like, what GC2 instances? Can you have zero downtime upgrade events with what you are running with? The answer is probably, but it would take a lot of time, potentially a whole person’s full-time time. And we didn’t want to get into that. We just wanted to get into building the product that we wanted to build. So Imply was right there, and so we came out to talk to you, and here we are.
[00:13:22.570] – Reena Leone
Awesome. I mean, yeah. So you get to do the fun stuff instead of just spending all your time doing the management and maintenance portion.
[00:13:30.670] – Hiroshi Fukada
Yeah. I mean, there’s certainly lots of knobs to turn, even with what we have today. And we’re trying to run as slim as possible because we’re trying to be real budget friendly. And data is one of the most expensive things that we do. It’s not creating an entity, which is a row in a database, which is fractions and fractions of a cent. This is actual… When you’re running on customer-sized data and they’re doing millions of requests a second and you’re handling N of those, then it becomes a big problem.
[00:14:06.360] – Reena Leone
Are there any other additional use cases or integrations you’re exploring?
[00:14:10.550] – Hiroshi Fukada
Yeah, we’re looking forward to some more options for compaction in the future and being able to query from deep storage. One of the limiting factors for Druid and what we can provide for the speed that we do is you keep a lot of data hot so that’s immediately available, but we could provide a different experience that is querying from deep storage so that we don’t have to delete the data off the hotset and never see it again. This is one of the things that we’re most looking forward to in the native query engine because it doesn’t exist right now. It’s only for SQL, but we are excited for the native way to query history.
[00:14:51.790] – Hiroshi Fukada
And then the other thing is the compaction. Compaction has been a mystery for us for a while because we used a larger data ingest format than we are doing today. And so compaction tuning was a bit of a mystery to us. And we would also like to do step-by compaction. So you might ingest a 10 second granularity and then a day later, you roll it up to 60 seconds, and then 30 days later, you roll it up to an hour, but all on the same data source to make it really, really easy.That would be cool.
[00:15:25.410] – Reena Leone
Yeah, that would be… You know what? The part of this show is seeing what’s on people’s wishlists and being connected to the community and keeping track of what is on the roadmap for subsequent Druid releases. I feel like I’m going to take a look and see what’s going on. All right, let’s shift gears back into open-source Druid because that’s how you and I actually originally connected over a community extension committed by Kong in Druid 29, which was DDSketch, which I thought was Datadog, but it was actually Kong. You were kind enough to walk me through it for an article I did that I’m not going to rehash here because you can go read it and I will link to it. So thank you again for talking me through it. But for the folks who haven’t seen the article or haven’t heard of DDSketch, can you tell me what it is and who should use it and how they would benefit from it?
[00:16:14.650] – Hiroshi Fukada
Yeah. DDSketch is a sketch-based algorithm, which means that you’re getting approximate quantiles out of this thing. It is a value error guarantee type of sketch versus a rank error type of sketch, which is provided natively by Druid itself, and it’s exposed on the double quantile sketch, that sort of library. The difference between value error and rank error guarantees is that rank error gives you a guarantee that if you ask for something like a P99 or let’s go with P90, and you go with P90 and you get a rank error of 1%, then you’re guaranteed to get an answer between P89 and P91. For stuff like network latencies, which we deal with, the most hard question for these types of queries is a distribution that is long tail. So your P50 or your median could exist at 20, 30 milliseconds. But the long tail of P99 and anything above P90, really, can go up to 15 seconds. Let’s say that your P95 is 10 seconds. You’re guaranteed an answer somewhere between probably 5 seconds and 15 seconds. And the other thing that was a natural about the double quantile sketch that Druid provides is that it’s a random-based algorithm, so you could run the query twice and get two different answers out.
[00:17:55.320] – Hiroshi Fukada
When you draw this on a line and you have something like auto refresh enabled, you I get lines that wiggle and people get confused about that. The nice thing about DDSketch is that it’s a value error guarantee. And so what this means is that in our previous example, if you have a P95 that is something like 10 seconds and you have a value guarantee of 1%, then you get a guaranteed answer of something that exists between 9.9 and 10.1, which is way more the characteristic that people are looking for in something like a long tail distribution. The other thing that’s nice about it is it’s tunable. You can change your error rank guarantee, and it also doesn’t do random distributions. So you ask it the same question with the same distribution, and it will give you the same answer back. So you no longer have wiggling lines.
[00:18:50.220] – Reena Leone
No wiggling lines. I got it. Okay, so I actually want to end this episode with a little bit of a fun thing. So I’ll let you choose your own adventure here. Do you want a music-based question or a video game-based question?
[00:19:03.290] – Hiroshi Fukada
Video game.
[00:19:05.420] – Reena Leone
All right. What’s your go-to video game? What’s your favorite?
[00:19:08.120] – Hiroshi Fukada
I have played Dota 2 since 2013, and I continue to play to this day. I hope to be able to play this game for several more years because it’s my social outlet to talk to my friends that exist in different parts of the country.
[00:19:24.610] – Reena Leone
Oh, my gosh. That’s the same with my former roommate. That’s how he and his best friend hang out because they live in different states. But it’s usually Dark Souls or Overwatch.
[00:19:36.170] – Hiroshi Fukada
Yeah, I need that competitive edge. I need to make the kids cry.
[00:19:43.760] – Reena Leone
Oh, my gosh. That is the perfect way to end this episode. So thank you so much. Thank you, Hiroshi, for coming to do the show. And if you want to learn more about what Kong is doing, please visit konghq.com, and make sure to check out the blog that we have put together on imply.io. If you want more information about open source Druid, please visit druid.apache.org. Until next time, keep it real.