The World of Operational Visibility

Apr 12, 2023
Reena Leone

As organizations in sectors like manufacturing, transportation, and retail transition their operations into digital realms, they face the challenge of managing a vast amount of data originating from various sources, including IoT devices, checkouts, and more. Operational visibility has never been more important.

Operational visibility refers to the ability to monitor, understand, and optimize systems in real time. It involves gaining insights into the performance, health, and behavior of various components within an infrastructure, such as servers, networks, applications, and workflows. By having operational visibility, organizations can track and analyze key metrics, detect anomalies or issues, and make data-driven decisions to improve efficiency, performance, and reliability.

Apache Druid, a high-speed, scalable database that handles historical and real-time data,  allows immediate access to ingested data for queries, analysis, and other operations without the need for prior data preparation. See where we are going here? 

Listen to the episode to learn about:

  • Why Apache Druid is the right database for operational visibility
  • When to build your own solution / application and when to buy
  • Companies using Druid for operational visibility, from tech companies to utilities 
  • Where operational visibility is headed, especially in the world of IoT

Learn more

About the guest

Will To is a Senior Product Evangelist at Imply, where he talks use cases, writes up deep dives on IoT and other industries, and tinkers with the product. In his past lives, he worked as a schoolteacher and a travel agent before making his way into tech with roles at Datadog and MongoDB. Will lives with his family in New York City, where he spends his spare time chasing after his young daughter and building model airplanes.


[00:00:00.970] – Reena Leone

Welcome to Tales at Scale, a podcast that cracks open the world of analytics projects. I’m your host, Reena from Imply, and I’m here to bring you stories from developers doing cool things with analytics way beyond your basic BI. I’m talking about analytics applications that are taking data and insights to a whole new level. On today’s episode, we are going all in on one of Apache Druid’s top use cases: operational visibility. Operational visibility is the ability to monitor and understand and optimize systems, ideally in real time. And if you’re doing monitoring or logging or building or using analytics applications or tools or dashboards, you’re doing operational visibility. I think it’s safe to assume that any of us dealing with technology, whether it’s your whole industry or your job function, understand how important it is to know what’s going on with your infrastructure, your applications, your workflows, et cetera so they work better, and any issues can be detected and handled swiftly. But even that, I mean, sounds kind of basic, but when we dive into, say, the world of IoT, for example, it gets really interesting. To help us understand how operational visibility works across different industries and use cases, I’m joined by Will To, Senior Product Evangelist at Imply.

[00:01:12.830] – Reena Leone

Will, welcome to the show.

[00:01:14.730] – Will To

Hi, everyone. Hi, Reena. Thanks for having me. Yeah, happy to be here. This is a treat because I’ve definitely listened to Reena’s Tales at Scale before, and to be honest, I never thought I’d ever come onto the podcast. But I’m happy to be here.

[00:01:27.790] – Reena Leone

It’s my show and I get to choose who’s on it. And so, Will, I am psyched for you to be here. I like to kick off every episode with a little bit about my guest and how they got to where they are now. So tell me a little bit about your background in the tech space.

[00:01:43.010] – Will To

Sure, yeah, that sounds good. So I bounced around jobs for a bit in my 20s before coming into tech. Worked as a teacher a few times, worked as a travel agent. But finally I got into tech probably around 2018 or so. So I started at Datadog, which is basically actually an observability company, so it’s related to what we’re doing now. And then from there, I went to MongoDB, and after MongoDB, I ended up at Imply, and here I am.

[00:02:10.600] – Reena Leone

Awesome. So, a few episodes ago, we ran through kind of the top four use cases for Apache Druid. Operational visibility, rapid data exploration, customer analytics, and then real-time decisioning. The one I want to focus on, as I said, at the top of the show, is operational visibility. But sometimes it gets kind of looped in with application observability, and I think it’s a lot bigger than that. How are these things alike, but how are they also different?

[00:02:40.420] – Will To

That’s a great question, Reena. So something about operational visibility, I would say, is that well, first of all, when you say operational visibility to people, usually what they think of is application observability because generally, at least, at Imply, we’re speaking about database and people think it’s only for a digital monitoring context, right? That’s the most common context that people think of. But to be honest, operational visibility is pretty broad. Like you said, it’s a pretty big umbrella and a lot of things fall under that umbrella. So for example, application observability falls under that umbrella. Like you said, IoT falls under that umbrella. Any area where you can get visibility into your operations, operational visibility encompasses that. But really, IoT, I think for the purposes of our podcast probably will talk most about IoT and application observability, but it does extend to other spaces as well.

[00:03:30.530] – Reena Leone

So we know that Druid is great for this use case. But if you’re going with Druid, sometimes you’re kind of coming into a question of whether you should build it out or if there’s already something available, right? Like you mentioned, you came from Datadog. There are other operational visibility tools available in the market. How do you figure out when to build your own tool versus when to buy an existing one?

[00:03:59.310] – Will To

Yeah, that’s a great question. So the build or buy conundrum is pretty big, right? And it’s important because it costs companies money, right? And time the thing I will say that it’s very different for each company’s situation and there’s not really any one size fits all answer. So I think it really depends on a few factors. Customization. Costs. So for example, how much customization do you need? So what sort of filters are you going to be when you explore the data that your environment is generating? What types of filters are you going to need? Are they going to be very granular? Are they kind of random? Are they five days or like five hours or six months? So one thing, that degree of customization isn’t always available with every off the shelf product that you buy. And sometimes if it is available, it costs more money than probably your team or company might be willing to pay, right? So that’s one. Another thing, of course, is cost. So the tricky part is a lot of these solutions that are off the shelf, some of them have complex pricing models, right? So they don’t just charge based on how much data you’re monitoring.

[00:05:16.380] – Will To

For example, there’s a few different models, right? So one way they could charge, for example, is per unit of monitoring. So that could be like a virtual machine, a server, a database node, et cetera. But it could not just only be that, it could also be that like a one unit of monitoring plus throughput. So your throughput would be like, for example, x dollars for 20GB per day or something like that. That’s kind of high end, but you get my meaning. And so my point is it gets really complicated and if you add up, the costs add up right? Because it’s usually not one or the other, it’s sometimes it’s both or sometimes it’s multiple ways to charge. So with that, that’s another issue, right? And if you’re running a massive environment, if your environment is generating, like…

[00:06:04.860] – Reena Leone

Petabytes of data, because that’s what we talk about on the show all the time.

[00:06:08.130] – Will To

If you’re generating petabytes of data, your annual cost could run into millions, right? And at that point, it might just be easier and cheaper to have your people, your teams, build and run a solution like your own solution, your own in house solution, right?

[00:06:22.540] – Reena Leone

I mean, and those are probably the folks that are listening to this right now who are exploring Druid, because whether you’re using Druid open source or you’re using, say, Imply Polaris, which is a database as a service built from Druid, which is a little more managed. Either way, you’re probably looking to build these type of tools on top of the database versus buying something that already exists for the reasons that you laid out, including customization and cost. But whether or not you choose to buy or because you’re a company with like, say, you have all the money and all the budget or to build, another factor to consider is, are you looking at monitoring in terms or operational visibility in terms of monolithic architecture, or are you looking at microservices? How does your architecture approach change how you do operational visibility?

[00:07:14.580] – Will To

Yeah, that sounds good. Actually, I left out one thing for one part of the answer for the previous question. The last one is so it’s cost customization building or buying? It’s cost, customization, but also concurrency. My apologies, but basically how many people at once are going to be or how many queries at once your data is going to face? So if you have a cloud data warehouse, they’re great at scaling, but they’re not great at concurrency because they can’t handle 100 loads at once. And sometimes the queries take a while to get back. Or if you’re unlucky, you might get hit with a spinning wheel of death, right, and you might never complete in the first place. So that’s something else to consider, too.

[00:07:50.280] – Reena Leone

Every time you talk about the spinning wheel of death, I always think about my experience buying San Diego Comic Con passes. And it just a little part of me feels anxious. But I think all of us on the Internet know a feeling of dealing with that. No one wants to deal with that professionally. But let’s talk about monolithic versus microservice architecture. How does that change the approach to operational visibility?

[00:08:14.200] – Will To

Sure, yeah. So monolith and microservices, quick, quick background. Monoliths are basically, as the name implies, one giant code base, right? And monoliths, for many years, until recently, they were basically the predominant form of code bases. So most of the applications you interact with online were monoliths. And basically, let’s use an example, for example, an ecommerce website. So say you’re a retailer. Say Reena is selling her merch online, right?

[00:08:43.660] – Reena Leone

I have cool merch, I have a lot of merch, a lot of people buying.

[00:08:48.630] – Will To

Exactly right. So Reena’s website is really popular and a lot of people buy from her. And so there’s a few functions for the website that we need. So the first one would be checkout. Another one would be search, which links to a database, right? And another one would be like add to the button that says add to cart, for example. These are just three components, but let’s focus on these three for now. So these three components would all be part of if it’s a monolithic code base, they’re all part, everything is in that monolith. So these three components are part of it right now that has advantages and disadvantages. So monoliths are easy to deploy and they’re easy to monitor because everything is here, right? Everything is in that giant code base and you don’t have to really go anywhere else. But they’re hard to scale and they’re hard to update. And what I mean by that is if Reena is running a Black Friday sale, then it’s going to be a little tricky because maybe you’ll have like 3000 customers, full shopping carts and you’re going to have trouble scaling only your shopping carts, right?

[00:09:47.400] – Will To

And from there, that’s how microservices was invented. Now microservices are basically discrete services for each function. So shopping cart would be a function, search bar would be a function, add to cart would be a function, et cetera, et cetera. Right? So everything would be a function on its own. And the benefit of that would be everything is really easily scaled. And on a side note, this is also where containers comes in. So basically Docker containers and also the software that controls these containers. Kubernetes…

[00:10:21.110] – Reena Leone

Kubernetes  is so hot right now. We will do a Druid in Kubernetes episode at some point. I just need to find the right guest.

[00:10:33.850] – Will To

Yeah, for sure. Hopefully, hopefully by then you’ll have someone who’s less of a generalist and more of a specialist.

[00:10:38.440] – Reena Leone

If you listen to this and you want to do it, email me.

[00:10:41.550] – Will To

Exactly. Right. So basically containers are like different, they’re essentially discrete boxes of software like shipping containers, which is where they got their name from, right? And so if you need more of one thing, you just add more containers. And same with microservices, you need more containerized microservices, right? So if you need more of the checkout button, you add more of the checkout button. If you need more the shopping cart, you add more shopping carts, more capacity for search, et cetera. And the great news is you can spin them up and spin them down as you need. So you pay only for what you use really, right? Sizing, I guess so. That’s a great advantage of it. And microservices also because they’re discreet, they can be updated separately and you don’t have to like monoliths. You have to a lot of times, not always, but sometimes you have to take them offline and make them unavailable because you have to update everything within the code. But microservices, that’s not an issue. However, the issue with microservices is because they’re split up everywhere, it’s hard to monitor without the right tools. It can be hard to monitor.

[00:11:45.450] – Reena Leone

Yeah. And the whole point of this is like detecting issues and making things run more efficiently and optimized so that would make sense. It’s like either one thing breaks and takes everything down, or one thing breaks, but you have to find where it is.

[00:12:01.350] – Will To

Yeah, exactly. Actually, on a side note, one of the best tools that I’ve ever seen for monitoring microservices is basically like a map. It plots out the flow of data between your microservices. You can highlight how data flows between a microservice and if you click on it, it shows you which other microservices are dependent on that microservice. But not all solutions, monitoring solutions have that. And the ones that do can be expensive. Right. Depending on your situation.

[00:12:27.410] – Reena Leone

This is actually a really good segue into my next question is if you’re building out, regardless of your architecture set up, let’s just say that you have a microservice architecture, container architecture, what requirements would you need? Right. When you start thinking about building out an application for operational visibility, what would you need to start considering? I know we talk about Apache Druid, but a database is just one piece of the puzzle.

[00:12:59.030] – Will To

Yeah, so that’s a great question. I do think this might actually be a better question for Darin [Darin Briskman, Director of Technology at Imply]. There’s more of a background in this, but from what I understand, I believe other than the database, you definitely need infrastructure and what I mean, for example, like containers. Right? So for example, maybe one I think if I understand it correctly, one easy way to build your own solution would be say, use something like Kubernetes as a service. So you basically deploy your application on this virtual infrastructure. And because it’s Kubernetes as a service for something like EKS or EC2 or something, or Containers as a service, so one of those infrastructures can it takes away all of the hard work that your team has to do to basically provision servers for your application. You just them up and spin them as you need. So for our context, you probably also need a streaming technology. Amazon Kinesis and Apache Kafka are the two most common ones that we deal with. And basically you need that because you need some way to ingest data, your environment, and usually in operational visibility by nature, the most common method of data ingestion is streaming because a lot of the stuff is generated in real time.

[00:14:27.460] – Will To

I mean, there is also batch. As a little bit of background. I am working on an IoT ebook. Some of Imply’s customers, some of them do use batch data for example, sometimes they’ll upload data from customers and then they’ll put that batch data who put that batch data into S3 bucket or like Hadoop, and then they’ll go out and clean it and then put it into Druid. But that’s comparatively rare. Most of our customers use Kinesis or Kafka.

[00:15:02.130] – Reena Leone

Let me put it into some context, right, with use cases. One that we actually covered on a previous episode was Game Analytics, which is an Imply customer. And actually, I’ll just give a little recap for those who haven’t maybe heard that episode or just joining us who have yet to subscribe, you should probably do that. Okay, so Game Analytics, if you haven’t heard of them, is like the number one analytics tool for game developers. Their monitoring system monitors like hundred thousand plus games, and there are about 2 million active daily users playing the games that they monitor. So they’re getting at like 20 billion events per day. Before Game Analytics turned to Apache Druid and Imply, they kind of had a homegrown solution, right? So they went with the whole build, but it needed a lot of pre computation and they stored all their data in a DynamoDB. And DynamoDB is good for fast reads and fast writes, but it’s not great for analytics. There are no group buys, there’s no filtering. But that was actually only half of their issue. They had two different systems to support real time data and then historical data.

[00:16:10.260] – Reena Leone

And so they were also having issues with high concurrency, challenges with query response time. This was impacting their ability to provide real time insights to their game developers and game managers and product managers. So when they made the switch to Druid, now they could do real time and historical analytics in a single system. Less systems means less cost. And Druid also has direct integration with Kinesis, which is what they were using for their real time pipeline. And so when event would show up in Kinesis, they could query that data in the event in like about a second, if not less than a second. And they also stopped needing to do all that precomp work because it’s all done in Druid already. So that was kind of their use case of monitoring what’s going on with video games. But another one is actually Confluent, who I think we’ve talked about on the show a bunch of times. Confluent Cloud moved from a NoSQL database to Apache Druid for all the key reasons that Druid even exists, like subsecond, query response and the ability to handle massive data sets. But I think you actually know a little bit more about their particular use case when it comes to operational visibility.

[00:17:20.330] – Will To

Yeah, I mean, really? Just Confluent actually, the Confluent team themselves have talked about this in several contexts. They have a blog post on their site, we can link it in, I guess, in the show description, I assume. And also they’ve done some presentations, some video presentations at Druid Summit, maybe 2021, I can’t remember exactly. But at any rate, actually, so the situation that Confluent faced is not uncommon among some of our customers. So the first thing is Confluent, as you said, are the makers of Apache Kafka. And I think one plus for Druid was that Kafka, with Kafka, as with Kinesis, Druid is natively compatible. So no workarounds, really no extra work for your team to do. Right, but the situation that Confluent faced is similar, right? So, like you said, they use a NoSQL database to query and store data. But the problem is, as their data grew sorry to backtrack, so Confluent right now uses Druid as like a database for some of their key services. So that’s like Confluent Health+, which does notification and alerting, confluent Cloud Metrics API, which basically is like a customer facing interface where they data to customers, and Confluent Stream Lineage, which is a Gui that helps users explore event streams and relationships between data.

[00:18:44.940] – Will To

Similar to, I guess, what I was saying about mapping out the relationships between services. And so, yeah, so the solution that they used for that was, like you said, was a NoSQL database, and it couldn’t really accommodate lots of high cardinality metrics. And so also it didn’t have support for things like time series data. It definitely could not return queries quickly. And this is an issue because Confluent Cloud Metrics API and Stream lineage, if I understand correctly, these are both customer facing products. Well, actually they’re all customer facing products because Confluent Health+ is like an alerting product. So if your customer is not getting data in time, this is an issue, right? They have to get their data. And basically, it just couldn’t scale and it didn’t have the capabilities that Confluent needed. So the Confluent team and I think probably outside of Druid, they’re probably the people who know Druid the best. Outside our own engineers, right? Outside Imply’s own team. And so right now, if I understand them correctly, understand correctly right now, confluent has about 100 nodes for historical, like, older data that comes in through batch, through batch processing, and 75 middle manager nodes.

[00:19:53.570] – Will To

This is for real time ingestion and querying. And they deploy everything on Amazon EKS, that’s elastic, Kubernetes service clusters. So with this setup, it’s pretty impressive, actually, the scale that they can pull off because they’re ingesting over basically 3 million events per second, and they can retrieve data for over 250 queries per second as well. It’s data on a really massive scale. And that’s what Kafka does and that’s what we do as well.

[00:20:22.710] – Reena Leone

Tech examples are always the easiest to kind of talk about because we’re in this space. But I think it’s important to note that Druid’s operational visibility use case actually works across so many other industries anywhere where you need to monitor a data set. One that I think we were talking about off the show beforehand was fraud detection in terms of financial services. I think you mentioned there was, like, a bank that’s actually doing that right now.

[00:20:54.720] – Will To

Oh, yes. So DPS, Development Bank of Singapore and once DPS, actually, also, they did a presentation at Druid summit, I think ’21. A little bit of background. Actually, it’s interesting because DPS, as the name implies, they are from Singapore. And I’ve always found them kind of interesting because I think their history and their trajectory kind of parallels the history and trajectory of Singapore, the city state as well. So a little bit of historical background for our viewers. Put my teacher hat on.

[00:21:27.870] – Reena Leone

You’re putting your teacher hat on. Let’s do it. Let’s learn.

[00:21:30.690] – Will To

Singapore, actually, right now. So Singapore, if you’ve ever been, really awesome place, it’s basically located in Southeast Asia, so it’s near the Straits of Malacca, which is a major shipping thoroughfare. And if my numbers are correct, I think they take about, like, maybe 70% or so of the world’s container traffic because it’s essentially a choke point between the Pacific Ocean and the Indian Ocean and a lot of inbound. And outbound traffic like for example, from factories in Asia going elsewhere and empty containers coming back from other ports towards Asia. Or containers with raw materials coming back to factories in Asia so they can only really go through there. There’s a few choke points, like Strait of Hormuz near Iran and the Persian Gulf, Straight of Malacca or one. So that’s why it was basically for much of its history, it was basically contested by a lot of foreign powers, colonial powers, that sort of thing. Singapore was it was part of I mean, I’m not too clear about the early history of Singapore. 

[00:22:45.110] – Will To

 so when DBS was founded in the late 1960s, it was actually maybe a year or two, like, very soon after Singapore separated from Malaysia. So first Malaysia won its independence from the United Kingdom. And then, if my event, my timeline is correct, Malaysia expelled Singapore from their confederation, their federation, and Singapore had to stand as an independent city state. And the interesting thing is, Singapore doesn’t have much resources, and it’s very tiny. And so they really had to work really hard to get to where they are today. And DBS was actually a big part of that story. Right. That’s my little background. But now to come back. So DBS now is one of the bigger, the biggest banks in Southeast Asia.

[00:24:05.480] – Will To

When China opened up in the 1980s, DBS was one of, I think one of the first, if not the first bank to have, like, a foreign branch in China. Today, I believe they make like, if my numbers are correct, I think they have like $7.3 billion in revenue, et cetera. But anyway, so their use case for Druid is anti money laundering. And anti money laundering, as the name suggests, is basically stopping parties like criminals, terrorists, et cetera, from washing their money. So the money that they get from doing crime has to be washed and made to look legit. Criminals can do that in a few ways. They can set up front businesses and claim that money as revenue and then basically funnel that into their accounts.

[00:24:52.390] – Reena Leone

This is not a lesson on how to launder money, folks. Just letting you know. This is don’t do that. There are tools in place that you will get caught.

[00:25:02.450] – Will To

Speaking of tools in place, this is one of the tools. Right. So DBS, for some time, they were doing anti money laundering after the fact. They were reactive, not proactive. At one point previously, they didn’t necessarily possess the ability to basically suss out a fraudulent transaction and stop it in its tracks. Right.

[00:25:24.070] – Reena Leone

So they were looking at it like they could see, like, a pattern after. Right. So they would be like, oh, it would ping it because it would detect something was amiss.

[00:25:33.790] – Will To

Yes, exactly. So they would find it after the fact. And also, a lot of the anti money laundering efforts previously, they didn’t use streaming technologies. They didn’t use, like, Kinesis or Kafka. They used, like, batch data. And that’s part of the reason it was reactive, because it wasn’t set up for real time stuff. Right. So DBS decided that they were going to go real time, and they were going to innovate, and they were going to stop problems before they occurred. Now, here’s the thing, though. Going real time brings risks for a sector like banking, because you get a lot of false positives, and you don’t want false positives or false negatives. You want positive positives and negative negatives. So Druid actually played a big role in this. Right. So the first thing was they use Kafka and Druid, as we said, no need for extra code, no need for complex workarounds with Kafka. You can just plug and play and all go right. And the other thing that they wanted to do was to use an analogy, things like security operations, IoT, these are all very high volume. They’re all very fast. Asking a human team to find out, to pick out the fraudulent transaction or the potential money launderer is kind of like asking someone to filter out a single drop of water from a fire hose.

[00:26:54.110] – Will To

Humanly, it’s not possible. So what you have to do is you have to use AI, you have to use machine learning. So you have to train your AI and your machine learning on customer behavior so that they can better flag transactions, they can predict which ones are going to be fraudulent, which ones aren’t, and then they can identify correlations between the alerts that they produce and actual suspicious activity. But all this requires massive data sets and Druid is great for that. But basically by switching to Druid, they were able to automate more things. They were able to basically automate essentially all of anti money laundering or AML as they say. And so that means faster wire transfers, fewer business delays, investigations go more smoothly, things are cheaper, more frequent screening for high risk customers or accounts, but frequent AI screening so you’re less human intervention at the same time, but only human intervention at critical junctures, that sort of thing.

[00:27:49.880] – Reena Leone

That’s like a really important case in terms of making sure that illegal activity isn’t happening. But we’ve been kind of dancing around a little bit about IoT and one of the most interesting use cases is how it can literally affect our regular everyday lives because IoT is so involved in safety and utilities and transportation nowadays. I know there was like a really good use case around utilities actually using IoT and operational visibility using Druid. Can you tell me a little bit.

[00:28:30.880] – Will To

Several, actually, there are several and interestingly, quite a few of them are in renewable energy, right? So solar is the most popular use case. Now one thing I want to so a little bit of background. There’s two types of solar. There’s concentrated solar and there’s a concentrated solar, basically where you have 100 mirrors and they’re all beaming the sun at a single point. And basically at that single point is usually not water, I think it’s molton salt, but some sort of medium that heats up and boils and turns steam turbines to generate electricity. And that’s one and the other one of course is just the regular solar panels that you see on your house, right? So either, both of them require a lot of monitoring and because both of them really depend on the angle of the sun, concentrated solar needs the angle of the sun correctly because they need to reflect the sun and they have to track the sun in order to be able to do that properly. Conventional solar panels, they need to track the sun because that’s how you know, because electricity bouncing off photovoltaic cells is how they generate electricity, right?

[00:29:29.240] – Will To

So one really common use case for utilities is picking up sensors and actuators quickly. Backtrack there’s two types of devices. For IoT sensors which pick up and collect and transmit data. There’s actuators, which execute actions. Some IoT devices are both like smart thermostats are sensors and actuators because they can sense temperatures and change temperatures accordingly. So they need to detect the angle of the sun, they need to detect the angles of their panels and they need to have actuators there to move the panels. Another challenge too, is with utilities. The big money is really in the rooftop. Installations are great and they’re an essential part of renewable energy, right? Especially if we want to decarbonize properly. But giant utility scale. I’m talking like megawatts, megawatts, thousands of solar panels. Whether it’s concentrated or photovoltaic, we need more of those. And lots of utilities have those. And the problem is because they’re so massive, that a lot of them are located in remote locations. So if they’re in remote locations, that means less people, but also less connectivity. Probably as a result of that, you’ll have sensors picking up and collecting information. But sometimes the connections will drop out and they won’t be able to transmit.

[00:30:51.320] – Will To

There’ll be lags. They won’t be able to transmit anywhere from minutes to months. And that is an issue because your data is going to come back and complete. That’s another challenge that utilities have to deal with. Right?

[00:31:06.470] – Reena Leone

So in terms of building out applications to monitor this, then they know if it is transmitting, if it is tilted in the right direction to the sun, if it’s producing the right amount of energy, if it’s not meeting the energy needs, if they need to build more panels or substitute with other forms of energy. Like, I know on my utilities, you can kind of pick your mix now of where your energy comes from for your electricity. But I feel like this is where it’s really important because especially if they’re remote, how would you know? How would a solar company know what’s going on? And then I think, I know you’re still working on it, but isn’t there another one that uses this for kind of energy planning?

[00:31:59.050] – Will To

Yes, so they do use this for energy planning. So one issue with electricity, but also with renewable energy especially, is loads, right? I think the best way to say this is like physical infrastructure can’t scale as easily as digital infrastructure. You can’t just be like, oh, let me snap my fingers and spin up like a thousand virtual machines and spin it back down to zero. It doesn’t work that way with utilities. They have to be very careful with planning loads. Sometimes they plan years or like, well, it could be up to maybe not years, but definitely months in advance. And so one issue is if there’s a shortfall. So for example, if your utility and your utility customers are buying more energy from you, then do you have the capacity to generate? You have to make up that shortfall by purchasing power from other power plants or other utilities. Right. There are a lot of power sharing agreements, that sort of thing. Most of America’s grids are all kind of all connected to each other except Texas. So with that in mind so this company, they use Druid. Interestingly, they use Druid for batch data, actually.

[00:33:04.240] – Will To

But basically what would happen is the utilities would submit their smart meter data to them in batched form, and it would be put into like an S3 bucket. And then they would run like a Python process on it to kind of like have the GPS data, everything’s anomized, of course, because it’s personal data, right. But they would run a Python process, get the GPS data out, and then they would put it into Druid, and then from Druid organize process it, that sort of thing, and then display it in dashboards for their utility customers. So that way the utility customers can see, the utilities themselves can see. Oh, last year at this date on April 10, our customers used X megawatts. We had a shortfall of maybe of Y megawatts and we have to buy so it would be safe to buy Y megawatts from another power plant or another utility, that sort of thing.

[00:33:59.980] – Reena Leone

So what you’re saying is Druid literally helps keep the lights on.

[00:34:03.900] – Will To

Yeah, it does.

[00:34:05.710] – Reena Leone

We have no marketing spoken, but even if I take it down another level, I think that Druid operational visibility and their use cases for it can actually be important for safety. Right. Like transportation always comes to mind, or manufacturing in terms of sensors, alerts, making sure machinery is functioning, having certain triggers in place. If it’s not, we see this with trains and things, making sure that they stay on tracks, which doesn’t always happen. And I bet there’s probably some Druid use cases in there as well.

[00:34:52.980] – Will To

Yes, there are. To the best of my knowledge. I don’t think there are. I’m not sure if there’s any railroads actually using Druid right now. So, for example, maybe a little bit of background. If I can put on my teacher hat.

[00:35:05.770] – Reena Leone

A little bit of background, will you don’t say, by the way, I don’t know if the show is still on the air, but if Who Wants to Be a Millionaire was, like, still functioning, you would be my phone a friend, because I feel like whatever the topic was, you would have some information on it.

[00:35:22.230] – Will To

I’ll do my best, I’ll be your lifeline. So IoT definitely has a ton of applications, a ton of uses for transportation, especially like railroad. So I would like to use, let’s use three examples. So two examples for trains, mass transit and airlines. Okay. So for trains, and we’re not talking about commuter trains, I’m talking about like freight trains, right. A lot of the country’s cargo is still moved by train because it’s efficient. There is long haul trucking. But from what I understand, it’s kind of like hub and spoke network. So you move big cargo between points, and then from those points to the last mile, that’s when they switch to trucks, right? Yeah. So East Palestine, Ohio, there was a derailment there recently, and the National Transportation Safety Board, the NTSB, is the lead agency for investigating these disasters, whether it’s trains, boats, airplanes, et cetera. They recently released a report on the East Palestine disaster, and their conclusion was that an overheated ball bearing caused the train to fail and derail. Now, the issue here is that this ball bearing actually was the overheating in the ball bearing was actually detected several times at three points, if I remember correctly, by built in sensors.

[00:36:40.130] – Will To

But the issue here is that the ball bearing so the threshold at which train crews were to stop was higher than it should have been, probably. The ball bearing overheated when it was 250 degrees above the outside ambient temperature was about ten degrees. This is all Fahrenheit, by the way. But the train crew, I think the procedure was they only braked when it hit 200, and by the time they hit 200, it was too late. So that’s one application, right? And on a related note, another application is mass transit. So I live in New York, you live in Boston, and one thing that many mass transit systems in America have in common is that their signaling systems are very outdated.

[00:37:25.950] – Reena Leone

I was just in Tokyo, so it just hurts my heart when I come back.

[00:37:32.610] – Will To

Mass transit is like, other countries take mass transit very seriously. It’s like a national, like a point of national pride. So one problem, one the probably I think the biggest problem with American mass transit is that their signaling systems are outdated. Now, New York City, for example, New York City is the oldest subway in the country, and they still use technology from the 1930s. They use something called fixed block signaling, and fixed block signaling is they divide up the track into they divide up on the track into into, like, certain segments. Right? I’m not sure exactly how many. Maybe like 20 meters or so, 30 meters. And so each segment has each segment is basically set off, and they have, like, signals, and they have signals, and basically they detect when the train is there, when the train is there, only that train can be in that segment, and it’s closed off. The actual hardware that runs is like ball bearing. It’s like ball bearings, literally. They have workshops where they have to fabricate the parts because the original manufacturers have gone out of business, and you can’t buy them anymore. Of course, the problem here is that these old signals fail.

[00:38:38.220] – Will To

And the other issue, too, is that because the track is divided up into discrete blocks of however many meters they are, you can’t run the trains too close together. They have to be at least that far apart in order to be within their safety margins. Right now, the issue is now, of course, they have new technologies to do this, something called communications based train control, CBTC. So CBTC basically enables something called moving block signaling. And basically that enables trains to have a certain distance between them and maintain that distance between them even as they are moving, as they’re in motion. Now, they do that through sensors. They do that through sensors. A lot of them are wired sensors, if I understand it correctly. But there are sensors on the track, sensors on the wall, and they make sure that they can fix the train location pretty accurately. And if the two trains come too close to each other, they can execute actions remotely, that sort of thing. There’s also actually a bit of a side note, but in 2019, their testing kind of, maybe not upgraded to CBTC, but kind of like an addendum a little bit of an upgrade to CBTC.

[00:39:53.410] – Will To

They’re adding something called ultra wide band sensors. And these are not connected, but they run off like a radio frequency. Supposedly, they’re easy to install. They might cut down the costs of installing actual wired sensors on the tracks, I’m not sure. Only time will tell. But as of right now, in New York at least, there’s only two lines that are completely fitted out with CBTC. The line I live on, I’m from Queens, but the line that I live on is still being fitted out with CBTC. I hope they’re done with it soon.

[00:40:21.830] – Reena Leone

I’m not going to chime in with what’s going on over here in Boston because nothing was on fire today at the time of this recording. So we’re doing great. We’re kicking off this week well.

[00:40:31.210] – Will To

It’s like literal meaning of hot mess.

[00:40:34.270] – Reena Leone

But when you’re dealing with the New York train system is massive. Right? And so this is producing a lot of data. You have to know where every single train is, and then also where every train is in relation to every other train. That is either like, running on the same track, right?

[00:40:54.150] – Will To

Yeah, exactly. One thing with trains at least is a lot of stuff has to be automated because you can’t send data. You can’t be like, hey, these two trains are too close together, and then send data back to the server and be like, the server should stop it. And then you don’t have enough time. Right?

[00:41:13.990] – Reena Leone

There’s no technician there being like, oh, I should press a button. That’s not going to work.

[00:41:19.510] – Will To

It has to be automated. Right? So it has to be there. Did I answer the question? Sorry, was there something else?

[00:41:24.040] – Reena Leone

Yeah, I mean, we were talking about why operation visibility is important with transportation. And the IoT example is probably like the most dire one, the most important one. Right. Because we’re literally talking about train safety. But it has to be able to not only process that data, but to your point, it has to automate the response. So all of that has to be in real time. It can’t be sent back and wait for someone to review it or else there could be a catastrophe.

[00:41:58.540] – Will To

Yeah, exactly. And another example that we could use is like, satellite IoT. And satellite IoT is good for things like logistics, like container ships, like tracking your containers as they make their way from factories in one part of the world to buyers, and to warehouses and buyers in another part of the world. Another one is airlines, too. There’s a company called Inmarsat, and when Malaysian Airlines 370 disappeared back in 2014 Inmarsat has like, if you buy like a so if an airline buys a subscription to Inmarsat, they have they install like a satellite data link on the airplane. And that data link basically transmits data from the airplane to the satellite. It depends which subscription you buy, right? So it depends what subscription you buy to see how granular the data is, et cetera. In MH 370s case, I don’t think it would have made a difference because somehow the power to the satellite data link was knocked out. But one way they were able to narrow down the search area in the Southern Indian Ocean was because even though it was knocked out, auxiliary power was restored to MH 370 at one point.

[00:43:09.290] – Will To

And the satellite data link actually sent out like a bare minimum of information to the satellite. I’m not sure it’s beyond my pay grade to figure out how exactly the locations were.

[00:43:19.330] – Reena Leone

We’re not solving the mystery of the airplane that went down several years ago. That’s okay.

[00:43:27.250] – Will To

But basically, engineers at Inmarsat and the various transportation safety boards, Australia’s, Malaysias, Americas, and Frances, et cetera, they figured out from the telemetry data that the satellite data link was submitting to the Inmarsat satellite, that MH 370, the wreck of MH 370 was probably within this area. There’s still a massive area, but better than the whole world, right?

[00:43:51.770] – Reena Leone

Okay, so to wrap this up, really simple statement. Why does Druid make sense for operational visibility?

[00:43:59.170] – Will To

Because operations move fast. And if a company wants to basically see things and react to things as it’s happening in real time, you need like a solid database for that. And not all databases are built for that, which is fine. Databases have there’s lots of different niches, right? Cloud data warehouses, transactional databases, that sort of thing. But if you’re doing IoT, and if lives depend on you, or if lots of revenue depends on how fast you can automate, like execute triggers, execute automated actions, or respond, then Druid is ideal for that. Because Druid is specifically designed for this type of use.

[00:44:36.910] – Reena Leone

From being helping with auctions and ads to saving lives.

[00:44:44.830] – Will To

But one thing, actually another thing that another use is also for factories.

[00:44:49.470] – Reena Leone

Oh yeah?

[00:44:50.730] – Will To

So for factories, like, for example, assembly line, like with the train sensors on East Palestine, you can set a threshold for your assembly line temperature sensor for example. And if the temperatures pass Y degrees Fahrenheit or Celsius for X seconds, then you can do something like trigger fire suppression. So that’s another issue that can save lives, and that can save lots of money as well.

[00:45:13.270] – Reena Leone

Saving lives, saving money. I can’t think of two better things, right? That’s kind of both sides of the coin. Well, Will, thank you so much for joining me today, and I feel like I learned so much about not just the topic of the show, but also about the history of Singapore and also how trains work. And I am very grateful for you because I learned so much. Your teacher background really came through today. Okay, folks, if you want to learn more about Apache Druid or Imply, visit either or And until next time, keep it real.

Let us help with your analytics apps

Request a Demo