Securing the “Crown Jewels”: A Journey through Druid Database Security with Carrell Jackson

On this episode, we’re going all in on cybersecurity! Helping us with what critical aspects of security you need to focus when building analytics applications is Carrell Jackson, CISO at Imply. Carrell discusses the importance of protecting sensitive data by implementing role-based access control and encryption and shares best practices for securing a Druid cluster, including authentication, authorization, and encryption, highlighting the significance of minimizing vulnerabilities and following security protocols.

The episode also dives into Imply’s security-first approach to their products, offering different deployment options like on-premise, hybrid, and Imply Polaris, a database as a service, to meet diverse security needs, data privacy requirements and more.

If the challenges of staying ahead of evolving threats also keep you up at night- this is the episode for you. Listen to learn more about:

Apache Druid’s security best practices include role-based access and encryption.
Network security recommendations for Druid include using firewalls and limiting access.
Imply support for single sign-on, role-based access, and data segregation for customer data protection.
How Carrell’s training as a certified ethical hacker enhances Imply’s security practices

Learn more

About the Guest

Carrell Jackson is a distinguished information security leader with over twenty years of experience, currently serving as the Chief Information Security Officer (CISO) at Imply. His career has been marked by roles that leverage his deep expertise in cybersecurity to enhance organizational security postures significantly. Carrell’s leadership has been instrumental in developing and implementing security strategies that safeguard critical assets, manage risks, and ensure business continuity across various organizations.

Previously, Carrell served in the U.S. Navy before roles at American Savings Bank, MGM Resorts International, and Truvantis, Inc., where he developed and managed extensive compliance and security programs. His technical and leadership skills were further honed through roles that encompassed vulnerability management, incident response, and IT compliance, notably improving operational response times and ensuring regulatory adherence.

He has completed numerous professional development courses, including Certified Information Systems Security Professional (CISSP) from ISC², Certified Data Privacy Solutions Engineer (CDPSE), Certified Information Security Auditor (CISA), and Certified in Risk and Information Systems Controls (CRISC) from ISACA., and Certified Ethical Hacker (CEH) from EC- Council. Carrell is an Executive Member of the CyberEdBoard Community

Transcript

[00:00:00.000] – Reena Leone

Welcome to Tales at Scale, a podcast that cracks up the world of analytics projects. I’m your host, Reena from Imply, and I’m here to bring you stories from developers doing cool things with Apache Druid, real-time data and analytics, but way beyond your basic BI. I’m talking about analytics applications that are taking data and insights to a whole new level.

[00:00:18.390] – Reena Leone

On today’s episode, we are going all in on security, from authentication and authorization to data protection strategies you’ll need when selecting a database for analytics applications.. And when you start your own Druid cluster. We’ll also dive into Imply’s security-first approach to products like Imply Polaris, a database as a service built from Druid that, among other things, was designed to meet stringent security and compliance requirements. On that note, joining me as my security expert today is Carrell Jackson, CISO at Imply. I love that title, CISO. Carol, welcome to the show.

[00:00:52.360] – Carrell Jackson

Hey, thanks for having me.

[00:00:53.560] – Reena Leone

Okay, so I like to start out a little bit with my guests and how they got to where they are today. So, Carrell, tell me a little bit more about your journey to becoming Imply’s CISO.

[00:01:03.640] – Carrell Jackson

Sure, it would be my pleasure. I have been part of the Imply team for three years now and grown into the position. I was Imply’s first dedicated security hire back three years ago as we began to scale the product offering, and we were venturing into our path to offer a SaaS product. Prior to joining Imply, diverse background, I was an active duty military member. I’m a veteran of the US Navy, served in the Navy for many years, transitioned out of the Navy into private sector, worked in banking, doing security for banking and those pieces, and then moved to Las Vegas, Nevada, where I worked for a large gaming and resort company and did security compliance for them, and then began a bit of consulting after that. And that’s what brought me into working for companies in the Bay Area and working with startups. I was with a SaaS-based learning management system prior to joining Imply. And so it’s been a diverse career. When you think about security and its practices, it’s really given me the ability to work in almost every area of each domain of security, from a federal government perspective all the way through to startups in building applications and software development.

[00:02:26.650] – Carrell Jackson

So it’s really cool.

[00:02:27.860] – Reena Leone

Awesome. Well, first of all, thank you for your service.

[00:02:29.860] – Carrell Jackson

Thank you.

[00:02:30.680] – Reena Leone

And then, you’ve been at Imply for three years now. Had you heard about Apache Druid before joining, or was that your first experience when you started here?

[00:02:40.240] – Carrell Jackson

I never worked in a company that was using direct analytics or data analytics as a tool, so I had not. But when I started researching opportunities and started doing research and seeing what was built in Apache Druid, it was just phenomenal to me. I remember watching a YouTube video that was done by our Chief Product Officer at the time, comparing query speeds between Apache Druid and another competitor’s data warehousing software. And it was just amazing to see how quickly and efficient Druid was. And in that aspect, I was blown away. And I was like, I want to be part of this movement, and it’s something that was really exciting.

[00:03:21.750] – Reena Leone

So I wanted to have you on the show because cybersecurity becomes more and more important every day. I know from speaking with open source Druid users, and Imply customers that along with performance and reliability, security is extremely important. Actually, I’d argue even more important than those things when choosing a database for analytics applications. For someone who is in the early stages of building an app, what should they be looking for in terms of data protection when they’re trying to evaluate a database?

[00:03:52.170] – Carrell Jackson

Great question. I think it starts even earlier than what they’re using it for. And it starts with… It always begins with the data identification. So in information security, we often refer to what we’re protecting as the “crown jewels.” So being able to ask yourself a few common questions, like what type of data will we be processing? And is that data sensitive to our company or is it sensitive to our customers? If the answers to those questions are yes, then it becomes very important to take steps to protect that data. And so if you’re just going to use a solution in a development environment and run it through its paces, security is going to be important, but not as important as if it’s hosting critical data to your organization or to your customers. And I’m sure we’re going to get deeper and deeper into what those pieces are. But common elements you think about when you think about protecting data within a database system is a role-based access to the data. Can the data be encrypted at rest and in transit to communications within the database and things along those lines?

[00:05:02.470] – Reena Leone

As I was doing my research for this around what Druid inherently in the open source version has in place, some of their best practices for security in terms of cluster setup are enabling authentication for production environments if they could possibly be accessed by, say, an untrusted network and not exposing the web console without authorization enabled. Another one is granting users the minimum permissions that they can have. I think that’s actually best practice with anything. Is there anything else folks should consider when setting up their Druid cluster?

[00:05:37.970] – Carrell Jackson

Yeah, so definitely. Apache Druid, the open source community, and the open source solution have done amazing jobs at documenting the processes that they recommend. So if you take an opportunity to take a look at the open source documentation, you’ll find some of the other good pointers or recommendations for best practices are Like you already mentioned, authentication, which is how you’re logging in. Authorization, which is what do you have access to, and the principles of least privilege, only giving you what you need to have access to to accomplish the task. But good security practice also recommends if you’re using password-based authentication, that you’re not using weak passwords, also that you’re not providing those passwords in plain text in your configuration specifications within Apache Druid. So Apache Druid, as a development tool, you have the ability to configure many things through the software coding itself. And so you want to make sure that you’re not hard coding passwords and secrets into configuration specs that could potentially be exposed. And then disabling JavaScript is also a recommendation within the open source.

[00:06:50.230] – Reena Leone

Okay, same question, but now onto the network where you’re running Druid. You should do things like enable TLS to encrypt communications within the and use an API gateway. But what else should folks think about when they’re setting this up for themselves or for their company?

[00:07:07.030] – Carrell Jackson

Yeah, wherever possible, use firewalls and other network layer filtering to only expose your Druid services and ports specifically required for your use case. For example, only expose your broker ports to downstream applications that execute queries. You can also limit access to specific IP addresses or IP ranges to further tighten and enhance security. Druid also comes with a pretty robust authorization and authentication model, and that allows for only allowing grant write permissions to any data source to trusted users, like you had mentioned previously, and then granting certain things like state read, state write, configure write, data source write permissions to highly trusted users. So really limiting user permissions to data sources and within what can be accomplished is going to be paramount for security.

[00:08:04.590] – Reena Leone

Let’s switch gears to a scenario where, let’s say, someone wants all the benefits of Druid but doesn’t have the time or the resources to get up and running on their own. What does Imply offer in terms of protection and security?

[00:08:20.770] – Carrell Jackson

Our services include not only the diversification and diversity of our product portfolio that can be customized to meet the needs of any potential customer- we offer an enterprise on premise solution where if data being housed in your environment, your cloud environment, is of the utmost importance to you, we can host and operate and manage the data on-prem completely. We have an enterprise hybrid solution which allows us to own the management functionalities and cluster management upgrading within our cloud environment, but the data stays within in your cloud environment and your data center. And then lastly, we offer Polaris, which is our newest offering, which is a database as a service where you trust us to do everything for you, and we build security around that to where data is super secure.

[00:09:17.810] – Carrell Jackson

Trust is a principle success criteria for us at Imply. No matter what product you’re leveraging, our security-first approach allows you to focus on harnessing the power of data while we do the work of securing it. When you are limited in resources or you may not have the technical expertise that you need, lean on us and we will be there to support you and give you that assurance that you need to be successful to build applications at scale.

[00:09:47.260] – Reena Leone

It’s interesting when you bring up resources because that seems to be something that folks often flag when they come to Imply for support. In terms of securing data, I can’t think of a more valuable reason to have more help when you have less resources. Also, another thing, I’ve heard folks talk about hybrid, especially if they are in EMEA and dealing with GDPR that tends to be a good solution there. So speaking of Imply customers, what are some of the things that they are concerned about in terms of security and data protection these days?

[00:10:24.830] – Carrell Jackson

I look at it as if I wasn’t a customer of Imply, what would I be concerned about? And so that’s how we protect our customers and their “crown jewels”, as we talked about previously. And it starts for us with how we build and deploy. So environmental hardening is the beginning. So we look at the configuration of our systems and make sure that they’re meeting industry best practices for hardening guidelines to make sure that insecure processes are not enabled within the back-end supporting infrastructure. We have a robust vulnerability detection and management program. So Imply, as a company, we do a faster release cadence, and we’ll talk more about that later in the episode. But it drives us to be able to solve problems and fix issues sooner, where if you’re leveraging open source Apache Druid, their release cycle is on a quarterly basis. And so you can potentially get fixes faster through the Imply’s products offering. And then you look at software development lifecycle, be it infrastructure or application security, we’re integrated into all of those processes to build a robust security program to support secure software development, secure application development, and to really just facilitate and foster customer trust in our processes.

[00:11:45.730] – Reena Leone

Are there any industry-specific security compliance requirements that Imply offers?

[00:11:51.170] – Carrell Jackson

Yes, there is actually. So we go back to building processes and building things that make things secure. We want to make sure that somebody else comes in, takes a look at these things, and says, You know what? Imply, you’re doing it the right way. So external validation of our internal controls is vital. And so for Imply, we landed currently on two standards that we get certified by external auditors on an annual basis. So we are currently ISO 27001 certified for a scope of our information security management system that includes the people, assets, technology, and process that is employed by Imply within all of our product offerings and associated toolings. So that certification comes in and looks at not just one piece of what we’re doing, but what we’re doing as a whole across the entire company that’s applicable to those control environments and says, you know what, Imply, you’re doing things within the meeting requirements of the standard of controls for ISO 27001, which is an internationally recognized certification, which helps us have a global presence. And then the second is we are also externally validated for a common control environment that is defined by AICPA, which is an auditing certification standard, but it’s for SOC 2, which is common in software service providers.

[00:13:19.090] – Carrell Jackson

We are SOC 2 type 2 certified for all of our product offerings for trust service criteria that are applicable to security, availability, confidentiality, and privacy. And so on an annual basis, auditors come in and take a look at our program for a period of operations for over a year, and they say, okay, your controls are designed effectively, and they’re also operating effectively. And we share those reports openly with our customers and prospects to show that we are committed to the security controls and commitment that we make to them so they can have that trust and assurance that we’re handling their information effectively.

[00:13:58.250] – Reena Leone

Speaking of trust, I feel like every day there’s a new threat, a new thing coming down. How do you deal with new threats? What happens if, say, our customers are exposed to a new cybersecurity threat?

[00:14:12.050] – Carrell Jackson

Well, fingers crossed, that never happens.

[00:14:13.320] – Reena Leone

Yeah, one can only hope!

[00:14:17.240] – Carrell Jackson

But we maintain a trained and dedicated security team here at Imply. That team is ready to respond to incidents should any of them arise. Our software lifecycle consists of monthly scheduled releases that get the latest and greatest features and security updates. But if a security vulnerability or threat is identified outside of the normal processes, we’ll initiate our response teams to get a hot fix out as fast as we can that will come with recommendations for upgrade or implementations of a mitigation, because maybe not all of our customers want to upgrade, but there could be steps that they could implement in their system without upgrading that would mitigate at the risk of the particular threat or vulnerability.

[00:15:02.670] – Carrell Jackson

We saw it, it seems like such a long time ago, but Log4j was a huge one that impacted the global space. It impacted Imply and Apache Druid open source. And we saw a huge effort go into that resolution a couple of years ago, where we had a fix out with mitigations in place in under 24 hours for a global zero day vulnerability, which is pretty amazing, considering the timing of when that particular vulnerability came out, rolling into the holiday periods for a lot of folks.

[00:15:40.430] – Carrell Jackson

It doesn’t matter when these things happen because the adversary never sleeps. And there’s always going to be something that comes out. There’s always going to be the next vulnerability. There’s always going to be the next threat. And so we maintain vigilance and have dedicated teams to address those things as they come up.

[00:15:54.330] – Reena Leone

I also feel like those type of threats never really like to occur during your standard 9:00 to 5:00 business day. It’s great for a Friday evening or something like that. They’re like, Oh, that’s when major outages happen. But not for us, but in a general global cybersecurity issue. I feel like they’re at the most inconvenient times.

[00:16:18.020] – Carrell Jackson

It seems to be the case. I think even if it’s in the middle of the workday, it’s never a convenient time. But that is a great point that we do actually have a 24/7 security operations team that monitor everything. So regardless of when it happens, if it’s a Friday, a Saturday, a Sunday, a Monday at the middle of the day, our on call teams are going to be able to address those things as quickly as we find them and work with our engineering teams to get a fix out work with our customer success and experience teams to get out notifications and messaging to our customers so they don’t have to worry or ask questions about it. It becomes a huge team effort within Imply when these things arise.

[00:16:58.990] – Reena Leone

Okay, so speaking our customers, what support do we have for different authentication roles and authentication schemes to protect data?

[00:17:10.790] – Reena Leone

So this is a hugely robust question. And we could spend an entire hour of a podcast talking about authentication roles and authentication schemes. So what I will say is that out of the box, all of Imply’s solutions support single sign-on, and that’s through a variety of integrations, whether it be with LDAP or Okta, SAML integrations. And then within our SaaS offering, Polaris, we do some pretty amazing things when it comes to protections of data and getting data to our systems for ingestion. And so recently, the product team in engineering released a multi-VPC private connectivity for support of Amazon MSK, which is a lot to dive into from a technical perspective, but I encourage all of our listeners to explore our documentation site – http://docs.imply.io and look for these particular things in detail, because there’s so much information that’s included in this particular piece of knowledge and thought that goes into to the design of our product, that we want to make sure that our customers, initially, for authentication, they have processes that are in place that give them a secure method to get into our applications, but also to transmit data and connect to our APIs and have the data transfer be as secure as possible as well.

[00:18:53.130] – Reena Leone

Also, shout out to our docs team because Docs and DevRel are like peanut butter and jelly, so they do an amazing job making sure that our docs are all updated. So another question for you, and hopefully this is not a giant robust question, but can customers segregate their data to control access, to control roles, and things like that?

[00:19:16.690] – Carrell Jackson

Yeah, of course. So our products provide role-based access controls and can be defined in granular ways, enabling our customers to implement the principles of least privilege. And we actually highly encourage that. Like any application, when you build a system, obviously you’re going to have to have some level of large scale administrative access to start the process. But from there, when it comes In terms of what you’re giving tenants the ability to do, we highly encourage using role-based access and building roles and groups within the product to limit who can access what and how they can interact with that data. So you may have a team within your organization that only needs to be able to query certain sets of data. You can create a group and assign roles to that group to only set that specific data set. And then when they’re running their dashboarding or they’re running the analytic queries on that data, they would only have access to the data that they have permissions to. It’s highly important to limit that depending on the scale and sensitivity of your data.

[00:20:28.890] – Reena Leone

Are Are there elements of security that we apply? What policies help keep Polaris, for example?

[00:20:37.710] – Carrell Jackson

So within Polaris’ infrastructure, we had the opportunity to build Polaris with a security-first mindset. And in doing that, not only did we build a secure software development lifecycle, but the Polaris infrastructure is continuously monitored for compliance to AWS, as well as the Center for Information Security, CIS, hardening benchmarks. So my team monitors tools that we have implemented in the cloud infrastructure to be notified of any new resource that’s created that would fall outside of the compliance hardening benchmarks, and then we work with our engineering teams to get that resolved. Additionally, we deploy cutting edge cloud monitoring detection tools to provide internal teams with real time detection and response capabilities for any event that presents as an anomaly. And so there’s things that happen all day, every day within a cloud environment, and we have tools in place watching those things. And for instance, we see a resource get built in a region that hasn’t had a resource built in it before, that would trigger an anomaly event to our security teams that we would work closely with engineer engineers to say, Hey, there was something strange that happened here, and we would investigate that.

[00:21:59.070] – Carrell Jackson

And it could be, Hey, you know what? We’re building out yet another new region of support for our different product offerings, which is great. But our tools help us identify those things fast and remediate them quickly if they need remediation, but at least investigate them quickly based on those notifications.

[00:22:18.280] – Reena Leone

I mean, that makes total sense to bring it full circle, because I know that anomaly detection is actually one of the things that like, Druid helps several customers and actually in the open source realm, I believe as well, deal with. So it makes sense that that is something that we do as well. I know there’s often a requirement, you mentioned regions, right? There’s often a requirement of data privacy regulations and that data needs to be stored in that specific region. So how do we handle that?

[00:22:48.810] – Carrell Jackson

Yeah. So regional data protection is a large consideration for our service offerings. I highly recommend that you bring our legal and privacy teams in for a future episode dedicated specifically to privacy and data protection because they are the experts and can give you hours worth of content, I’m sure. But from a security perspective, and I work closely with that team. Our enterprise hybrid model allows for data to stay in the customers designated AWS region, like we talked about, or various regions in those environments. And analytics are processed without data ever leaving the customer’s data center. For Polaris, we’ve enabled geographical regions in support of data being hosted in specific regions, and we continue to expand those regional offerings in Polaris each quarter. Currently, at the time, right now, as of today, our enterprise hybrid regions, we support a whole list of regions. You can find them in our documentation site. But for Polaris, specifically, We support in AWS, AP- South-1, EU-Central-1, EU-West-1, US-East-1, US-West-2. Now, The EU ones are going to be important there. You think about Europe, Frankfurt, which is EU-Central-1, and EU-West-1, which is Europe, Ireland.

[00:24:22.710] – Carrell Jackson

That gives you the ability to keep your data in a region that’s specific to the European Union, which could help you meet your GDPR requirements. And then we’re also expanding our service offerings to be GA soon into Microsoft Azure, and we’re starting with US regions, but we’ll be scaling that out to different global regions as well in support of data protection needs for our customers.

[00:24:54.730] – Reena Leone

All the Azure stuff is so exciting.

[00:24:57.340] – Carrell Jackson

It is super exciting, right?

[00:24:59.140] – Reena Leone

I should probably do episode on that at some point when we are ready. So this is more of a personal question for you. So what keeps you up at night in terms of database security?

[00:25:11.470] – Carrell Jackson

Yeah. So it goes full circle to what we started to talk about. We have to identify our “crown jewels.” And once we’ve identified them, we know for us it’s customer data that we host, customer data that we manage, and our customers trust us to be responsible with. We work tirelessly to protect them. In our hosted environments, customers trust of us is of utmost importance. So the fact that I’ll say adversaries, right? So the fact that adversaries have the advantage of time. And that’s what keeps me up at night, because we can build systems that are secure. We can protect customer data to the extent that is within the bounds and limits of what we have the capabilities to do. But an adversary could be sitting and waiting and spending time. And that’s why vulnerabilities keep coming out. Some of them are because of software changes in different components, but some of them are also because folks can sit back and spend time just hammering away at systems, trying to find small cracks that they can get into and exploit. And so I know every day I wake up that we build secure applications and we defend our customer data to the best of our ability.

[00:26:35.880] – Carrell Jackson

But the fact that the threat is always evolving keeps me awake at night, and we cannot rest on our laurels. So we have to continue to evolve and stay vigilant to stay up to date with what’s happening in the world.

[00:26:50.330] – Reena Leone

Speaking of this, this is a good segue into my next question. In my research, I saw that you are a certified ethical hacker, which I think is so cool. Can you tell me a A little bit more about that?

[00:27:01.060] – Carrell Jackson

So it sounds super cool, and it has its fun aspects. So let’s start with, ethical hacking is a broad practice that covers a whole bunch of different technologies. The principles behind it and the skills that I’ve developed provide the ability to evaluate nearly any application or infrastructure to identify potential vulnerabilities and to provide recommendations on how those vulnerabilities can be remediated. I pursued that particular certification, along with a whole slew of other certifications in security, because I felt like it was important to be able to put on the lens of what we’re trying to protect against to help solve problems and have a security-first mindset. We build applications and have different system components. And so we, as a company, Imply we contract services from other providers to do penetration tests on an annual basis, and they come in and test our systems within a prescribed period of time, and they find those things. But like I said, the adversary has unlimited time. And so time box approach to penetration testing is valid and great and good. But having us have the ability to do this as often as we need to internally, that’s really where becoming and using the tools learned through becoming a certified ethical hacker come in handy to us as an organization.

[00:28:37.970] – Reena Leone

It is cool. And it’s cool to say. It is. It is cool to say.

[00:28:42.460] – Carrell Jackson

It is an interesting one to say, right? And it’s evolved a lot. We have members of my team that pursue this in their free time as well. And we’ve seen a revolution in the world over the last seven years when it comes to ethical hacking. I mean, There’s entire organizations dedicated to funding ethical hackers to help secure systems and security researchers going and using the skills they’ve learned to help make systems better for people versus actually trying to find exploits that cause havoc. You’re trying to find exploits to identify them, to let folks know that issues exist, and just to help them fix them before they become something that becomes dangerous.

[00:29:30.170] – Reena Leone

But that’s still so cool about being a certified ethical hacker, which actually, you know what I think is important to address on this show is what if somebody finds a vulnerability, both maybe if they’re working with Open Source Druid or if they are working with our products, how do they go about reporting that?

[00:29:50.120] – Reena Leone

Yeah. So Open Source Apache Druid is supported and part of the Apache Software Foundation organizations. And so Apache has a dedicated security team that address open source vulnerabilities. And so any nondisclosed security vulnerability in Apache projects should be reported to security@apachie.org. And if you find something in Imply’s service offering, then we encourage you to email security@imply.io and our internal teams will take care of those pieces.

[00:30:30.870] – Reena Leone

Fantastic. Well, Carrell, thank you so much for joining me!

[00:30:33.750] – Carrell Jackson

Yeah, it’s my pleasure. Thank you.

[00:30:35.310] – Reena Leone

This was definitely super informative, and I am glad that you got to be on the show. All right, everyone, if you want to learn more about Apache Druid, please visit druid.apache.org and if you want to learn more about Imply’s products from Imply Polaris to Hybrid to Enterprise and what we’re doing there, please visit imply.io. Until next time. Keep it real.

Log lake

Real Time Analytics Database

OBSERVABILITY CASE STUDIES

Content

Support

Apache Druid

Ready to decouple your observability stack?
No workflow changes. No migrations. More data, less spend.