Rockset recently published a blog post that compared the performance of Apache Druid 0.18 versus Rockset using the SSB benchmark. Druid 0.18 is about 9 months out of date at this point, so we wanted to revisit the benchmark based on the latest version of Druid (0.20.1), which includes several performance improvements we’ve been doing over the last few months.
Read MoreSuperAwesome’s mission is to “make the internet safer for kids”: a safe, effective, and entertaining place that is 100% “Kid Safe”, including an advertising system that protects the personal data of children.
Read MoreSuperAwesome’s mission is to “make the internet safer for kids”: a safe, effective, and entertaining place that is 100% “Kid Safe”, including an advertising system that protects the personal data of children.
Read MoreAmobee provides end-to-end advertising campaign and portfolio management across TV, digital and social media for some of the world’s largest brands, from Pringles to Spotify.
Read MoreAmobee provides end-to-end advertising campaign and portfolio management across TV, digital and social media for some of the world’s largest brands, from Pringles to Spotify.
Read MoreWe’re thrilled to announce that the fourth edition of Virtual Druid Summit will be taking place on November 18, 2020!
Read MoreWe’re thrilled to announce that the fourth edition of Virtual Druid Summit will be taking place on November 18, 2020!
Read MoreWe’re thrilled to announce that the fourth edition of Virtual Druid Summit will be taking place on November 18, 2020!
Read MoreLiquidM provides modular cloud-based software that allows agencies and trading desks to run their adtech activities and campaigns on a customizable, standardized, open platform. LiquidM provides real time efficiency, control, and insights into media planning and buying.
Read MoreLiquidM provides modular cloud-based software that allows agencies and trading desks to run their adtech activities and campaigns on a customizable, standardized, open platform. LiquidM provides real time efficiency, control, and insights into media planning and buying.
Read MoreApache Druid 0.20.0 contains over 140 updates from 36 contributors, including new features, major performance enhancements (6x-11x on some queries!), bug fixes, and major documentation improvements.
Read MorePollfish, the “easiest and most affordable way to get real-time insights from real consumers”, delivers democratic, real-time insights with an innovative Apache Druid®-powered pipeline that includes microservices leveraging an open-source Scala library for Apache Druid: Scruid. Anastasios Skarlatidis, Director of Data Engineering and Science at Pollfish, tells more.
Read MorePollfish, the “easiest and most affordable way to get real-time insights from real consumers”, delivers democratic, real-time insights with an innovative Apache Druid®-powered pipeline that includes microservices leveraging an open-source Scala library for Apache Druid: Scruid. Anastasios Skarlatidis, Director of Data Engineering and Science at Pollfish, tells more.
Read MoreInnowatts provides an AI-driven data analytics SaaS platform for power utilities and retailers worldwide. Customers rely on Innowatts and the 40 million plus meters they are managing for the data needed to be more predictive, proactive and connected to their customers and ratepayers, helping them better manage risk, improve profitability, maintain grid reliability and anticipate sustainability trends.
Read MoreInnowatts provides an AI-driven data analytics SaaS platform for power utilities and retailers worldwide. Customers rely on Innowatts and the 40 million plus meters they are managing for the data needed to be more predictive, proactive and connected to their customers and ratepayers, helping them better manage risk, improve profitability, maintain grid reliability and anticipate sustainability trends.
Read MoreWe promised you closure, and we’re offering exactly that in a neatly-packaged blog post. Ben Sykes, Sr. Software Engineer at Netflix, has answered the questions that he wasn’t able to address during his Virtual Druid Summit II session: How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experience.
Read MoreWe promised you closure, and we’re offering exactly that in a neatly-packaged blog post. Ben Sykes, Sr. Software Engineer at Netflix, has answered the questions that he wasn’t able to address during his Virtual Druid Summit II session: How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experience.
Read MoreWe promised you closure, and we’re offering exactly that in a neatly-packaged blog post. Ben Sykes, Sr. Software Engineer at Netflix, has answered the questions that he wasn’t able to address during his Virtual Druid Summit II session: How Netflix Uses Druid in Real-time to Ensure a High Quality Streaming Experience.
Read MoreThough you might have heard that Virtual Druid Summit was returning, did you guess that it would be so soon? We’re stoked, too! The third installment of Virtual Druid Summit will be taking place on October 7, 2020.
Read MoreThough you might have heard that Virtual Druid Summit was returning, did you guess that it would be so soon? We’re stoked, too! The third installment of Virtual Druid Summit will be taking place on October 7, 2020.
Read MoreWe recently discussed Mindhouse’s use of Apache Druid for clickstream data analysis and user behavior funnel analysis with Ankur Gupta, the company’s engineering technical lead. Ankur’s team relies on Druid to “segment users and understand how they are using our app”, and finds that “it’s especially helpful when we launch a new feature because we can understand the acceptance of the feature based on current user activity”.
Read MoreWe recently discussed Mindhouse’s use of Apache Druid for clickstream data analysis and user behavior funnel analysis with Ankur Gupta, the company’s engineering technical lead. Ankur’s team relies on Druid to “segment users and understand how they are using our app”, and finds that “it’s especially helpful when we launch a new feature because we can understand the acceptance of the feature based on current user activity”.
Read MoreWe’ve always believed that community growth and collaboration is critical to the success of Apache Druid. For this reason, we’re excited to announce that last week, the Druid Github repository passed 10,000 stars!
Read MoreWe’ve always believed that community growth and collaboration is critical to the success of Apache Druid. For this reason, we’re excited to announce that last week, the Druid Github repository passed 10,000 stars!
Read MoreBecause we care deeply about the health of the community while continuing to deliver the most interesting Apache Druid stories, we’re hosting the second edition of Virtual Druid Summit II on September 2, 2020.
Read MoreBecause we care deeply about the health of the community while continuing to deliver the most interesting Apache Druid stories, we’re hosting the second edition of Virtual Druid Summit II on September 2, 2020.
Read MoreThe Apache Druid community released Druid 0.19 on July 21st, 2020. This release contains over 200 new features, performance enhancements, bug fixes, and major documentation improvements from 47 contributors.
Read MoreThere is no doubt that Apache Druid is a success, or that the benefits of implementing Druid can be huge. It is the leading real-time analytics database on the market. Thousands of companies use it to use it to equip their employees and customers with self-service analytics to make better, faster decisions.
Read MoreIn Apache Druid 0.18/Imply 3.3, we added support for SQL Joins in Druid. This capability, which has long been demanded from Druid by the community, opens the door to a large number of possibilities in the future. In this blog I want to highlight some of the motivations behind us undertaking the effort and give you, the reader, an understanding of how it can be useful and where we’re going with it.
Read MoreWhen an engaged technical audience asks great questions, it’s easy to run out of time during Q&A. And that’s exactly what happened at Virtual Druid Summit! Because our speakers weren’t able to address all of your questions during the live sessions, we’re following up with the answers you deserve in a series of blog posts.
Read MoreWhen an engaged technical audience asks great questions, it’s easy to run out of time during Q&A. And that’s exactly what happened at Virtual Druid Summit! Because our speakers weren’t able to address all of your questions during the live sessions, we’re following up with the answers you deserve in a series of blog posts.
Read MoreThis short post describes how Druid compares against enterprise data warehouses. Druid is not a data warehouse. It's a real-time database for user-facing analytics application needing sub-second query response at high concurrency.
Read MoreThanks again to everyone who attended Virtual Druid Summit, and for being so engaged – as we previously mentioned, our speakers received more than 150 questions across their collective sessions! Unfortunately, there wasn’t enough time to answer all of your very good questions during the live sessions. In an effort to bring you some closure, we’ve invited our esteemed speakers to address the remaining questions in a series of blog posts.
Read MoreThanks again to everyone who attended Virtual Druid Summit, and for being so engaged – as we previously mentioned, our speakers received more than 150 questions across their collective sessions! Unfortunately, there wasn’t enough time to answer all of your very good questions during the live sessions. In an effort to bring you some closure, we’ve invited our esteemed speakers to address the remaining questions in a series of blog posts.
Read MoreI want first to thank everyone involved with making our first, albeit a tad non-standard, Druid Summit a smashing success. We, and the hundreds of folks who had purchased tickets, were extremely disappointed to postpone our first physical Druid Summit to November, but we were fortunate to have 10 great speakers agree to do their presentations virtually.
Read MoreI want first to thank everyone involved with making our first, albeit a tad non-standard, Druid Summit a smashing success. We, and the hundreds of folks who had purchased tickets, were extremely disappointed to postpone our first physical Druid Summit to November, but we were fortunate to have 10 great speakers agree to do their presentations virtually.
Read MoreI am happy to tell you that 10 days after we officially postponed Druid Summit, we have now launched Virtual Druid Summit, which will take place as a series of online talks on April 15. Each talk will be a spicy 30 minutes of real-world information, followed by Q/A. There are 5 talks from Druid practitioners from a variety of industries and spanning 3 continents. The summit will open with a Apache Druid Roadmap and Vision talk from Apache Druid PMC Chair Gian Merlino, and will close with a 2-way voice-interactive “ask us anything” session featuring Druid authors and contributors.
Read MoreI am happy to tell you that 10 days after we officially postponed Druid Summit, we have now launched Virtual Druid Summit, which will take place as a series of online talks on April 15. Each talk will be a spicy 30 minutes of real-world information, followed by Q/A. There are 5 talks from Druid practitioners from a variety of industries and spanning 3 continents. The summit will open with a Apache Druid Roadmap and Vision talk from Apache Druid PMC Chair Gian Merlino, and will close with a 2-way voice-interactive “ask us anything” session featuring Druid authors and contributors.
Read MoreI am happy to tell you that 10 days after we officially postponed Druid Summit, we have now launched Virtual Druid Summit, which will take place as a series of online talks on April 15. Each talk will be a spicy 30 minutes of real-world information, followed by Q/A. There are 5 talks from Druid practitioners from a variety of industries and spanning 3 continents. The summit will open with a Apache Druid Roadmap and Vision talk from Apache Druid PMC Chair Gian Merlino, and will close with a 2-way voice-interactive “ask us anything” session featuring Druid authors and contributors.
Read MoreTijo Thomas, a Solutions Architect at Imply, recently wrote a reference architecture for Apache Druid on Microsoft Azure that includes some best practices for running on services such as Azure VM, Azure Blob Storage, Azure Database Service and HDInsight.
Read MoreAnalyzing the potential petabytes or more of data from all these devices goes way beyond existing data warehouses or data lakes. Fortunately companies have already implemented IoT analytics using Imply, the real-time intelligence platform built on Apache Druid, the leading open source real-time time analytics database.
Read MoreEarlier this week, the Apache Druid community released Druid 0.17.0. This is the project’s first release since graduating from the Apache Incubator, and it therefore represents an important milestone.
Read MoreMuthu Lalapet, a Solutions Architect at Imply, recently wrote a reference architecture for Apache Druid on Google Cloud Platform (GCP) that includes some best practices for leveraging GCP services such as Compute Engine, Cloud Storage and Cloud SQL. The document describes example cluster architectures and their accompanying machine types and configurations. As such, it’s a helpful resource for planning and implementing Druid on GCP.
Read MoreIf you are a Vertica customer, you probably already know this. Vertica is not built for real-time operational analytics at scale. If you do not know Vertica very well, you might be surprised. This statement may seem controversial. It’s not. Nearly ¼ of Imply customers were existing Vertica customers who purchased Imply, a commercially supported version of Apache Druid, because they were trying to implement operational analytics and hit limitations with Vertica. Other Vertica customers also use open source Druid and self-support.
Read MoreIf you are using Apache Druid to analyze customer-oriented data you are probably familiar with the General Data Privacy Regulation (GDPR), which went into effect May 25, 2018. However, you may be less familiar with a new law, the California Consumer Privacy Act (CCPA), which went into effect January 1, 2020 and is likely become a *de facto* standard in the US.
Read MoreNielsen Marketing Cloud uses Druid to profile the various audiences that marketers and publishers would like to target on digital media, activate via various ad networks, and then gain insights on that activation after the fact.
Read MoreApache Druid is commonly used for clickstream funnel analysis, and in this blog post we’ll deep dive into how you can collect and analyze funnel data. While there are applications designed for clickstream analysis, such as Google Analytics and Adobe SiteCatalyst (previously Omniture), Druid is ideal when you have significant scale.
Read MoreBelow is a transcript of a short interview we conducted with Chaitanya Bendre, Lead Data Engineer at Zeotap, where we discussed there use of Druid to help address the difficult problem of identity resolution and multi-channel attribution.
Read MoreBlueshift is an AI-powered customer data activation platform enabling CRM and product marketers to intelligently manage their audiences and orchestrate large-scale personalized messaging campaigns at scale. Blueshift offers real-time campaign analytics as a core capability in the platform. Campaign analytics break down engagement metrics like impressions, clicks, conversions etc by channel, trigger, experiment etc. Currently two billion+ user interactions are tracked on a monthly basis.
Read MoreA recent paper by independent researchers at the University of Minho in Portugal compared the performance of Apache Druid to well-known SQL-on-Hadoop technologies Hive and Presto. In the tests, Druid outperformed Presto from 10X to 59X (90% to 98% speed improvement) and Hive by over 100X.
Read MoreWhen Hadoop is pushing data into Druid, Hadoop indexer performance is key and becomes challenging at scale. There are a quite a few things to consider when running large scale Hadoop indexing.
Read MoreToday, the Apache Druid community released Druid 0.15.0-incubating. Druid is known as an extremely high-performance database and much of the early design work has been focused on providing speed at scale. Lately we have made a pivot towards those “ease of” factors that help users get productive with Druid quickly.
Read MoreWe recently conducted our first Druid community survey. Every so often we’ll be asking our community a short set of questions to understand how they use Druid, and how they would like to see it improved.
Read MoreAlthough Druid draws ideas from a number of TSDB concepts, it is designed for a wider range of analytic use cases than those for which a TSDB is usually employed.
Read MoreToday the Apache Druid community released Druid 0.14.0, our second release under the Apache umbrella and the first major release of 2019. I thought I'd take this opportunity to talk about what's new in this release and what's coming in the future.
Read MoreIf you are reading this because you are considering whether to use Apache Cassandra/DSE/ScyllaDB or Apache Druid/Imply, then you can just stop right now.
Read MoreSecurity is a critical requirement in every deployment of a system that holds and processes data. In this blog post, we will discuss how we secured Apache Druid, and validated our implementation.
Read MoreMarch 2018 Druid Bay Area Meetup - eBay Monitoring Platform, links to slides
Read MoreIt is now possible to deploy a druid cluster in a secure setting.
Read MoreNovember 2017 Druid Bay Area Meetup video, with talks from MZ and Slack, and a Druid roadmap update from Imply
Read MoreNovember 2017 Druid Bay Area Meetup video, with talks from MZ and Slack, and a Druid roadmap update from Imply
Read MoreOur last Druid meetup had great talks about how Druid is used at Branch, the ongoing work around better integrating Druid with the Hadoop ecosystem, and our roadmap plans.
Read MoreOur last Druid meetup had great talks about how Druid is used at Branch, the ongoing work around better integrating Druid with the Hadoop ecosystem, and our roadmap plans.
Read MoreOur last Druid meetup at Sift Science had 3 great talks about use cases with Druid & Imply, and the upcoming Druid roadmap.
Read MoreOur last Druid meetup at Sift Science had 3 great talks about use cases with Druid & Imply, and the upcoming Druid roadmap.
Read MoreWith the Druid 0.9.2 release, Druid has added additional column compression methods for longs to significantly improve query performance in certain use cases. In this blog post, we’ll highlight how these various compression methods impact data storage size and query performance.
Read MoreToday, many companies are turning to streaming solutions which are enabling them to understand and make business decisions from their data immediately, resulting in an operational agility that was unthinkable only a few years ago. The new Kafka indexing service is an exciting milestone in the maturity of Druid's ingestion technology, giving users a way to stream data into Druid with exactly-once correctness.
Read More