The easiest way to evaluate Imply is to install it on a single machine. In this quickstart, we'll set up the platform locally, load some example data, and visualize the data.
You will need:
On Linux, your OS package manager should be able to help for both Java and Node.js. If your Ubuntu- based OS does not have a recent enough version of Java, WebUpd8 offers packages for those OSes. If your Debian, Ubuntu, or Enterprise Linux OS does not have a recent enough version of Node.js, NodeSource offers packages for those OSes.
Please note that the configurations used for this quickstart are tuned to be light on resource usage and are not meant for load testing. Optimal performance on a given dataset or hardware requires some tuning; see our clustering documentation for details.
First, download Imply 2.2.3 from imply.io/download and unpack the release archive.
tar -xzf imply-2.2.3.tar.gz cd imply-2.2.3
In this package, you'll find:
bin/*- run scripts for included software.
conf/*- template configurations for a clustered setup.
conf-quickstart/*- configurations for this quickstart.
dist/*- all included software.
quickstart/*- files useful for this quickstart.
bin/supervise -c conf/supervise/quickstart.conf
You should see a log message printed out for each service that starts up. You can view detailed logs
for any service by looking in the
var/sv/ directory using another terminal.
Later on, if you'd like to stop the services, CTRL-C the supervise program in your terminal. If you
want a clean start after stopping the services, remove the
Congratulations, now it's time to load data!
We've included a sample of Wikipedia edits from June 27, 2016 to get you started with batch ingestion, located in the
quickstart/wikiticker-2016-06-27-sampled.json. Open the
quickstart/wikiticker-index.json ingestion task file
to see how Druid can be configured to load this data.
Druid supports an OLAP data model, where you organize your columns into dimensions (attributes you can filter and split on) and metrics (aggregated values; also called "measures"). OLAP data models are designed to allow fast slice-and-dice analysis of data.
Druid supports OLAP data models, but does not require them. You can load your data without OLAP modeling by setting "rollup" to false in your ingestion task, then including all columns you want to index as "dimensions" and leaving your "metricsSpec" empty.
In the Wikipedia dataset, the configured dimensions are:
Time is a dimension too, but it's not included in this list since Druid always treats time as a dimension without explicitly being told.
The metrics in the Wikipedia dataset are:
To load this data into Druid, you can submit the ingestion spec that you opened earlier. To do this, run the following command from your Imply directory:
bin/post-index-task --file quickstart/wikiticker-index.json
Which will print something like:
Task started: index_wikiticker_2017-03-02T22:09:45.235Z Task log: http://localhost:8090/druid/indexer/v1/task/index_wikiticker_2017-03-02T22:09:45.235Z/log Task status: http://localhost:8090/druid/indexer/v1/task/index_wikiticker_2017-03-02T22:09:45.235Z/status Task index_wikiticker_2017-03-02T22:09:45.235Z still running... Task index_wikiticker_2017-03-02T22:09:45.235Z still running... Task finished with status: SUCCESS
You can see more information about ingestion tasks in your cluster by using your overlord console: http://localhost:8090/console.html.
After your ingestion task finishes, the data will be loaded by historical nodes and available for querying within a minute or two. You can monitor the progress of loading your data in the coordinator console, by checking whether there is a datasource "wikiticker" with a blue circle indicating "fully available": http://localhost:8081/#/.
Once the data is fully available, you can immediately query it.
This section showed you how to load data from files, but Druid also supports streaming ingestion. Druid's streaming ingestion can load data with virtually no delay between events occurring and being available for queries.
We've included several different ways you can interact with the data you've just ingested.
Druid supports SQL queries through HTTP and JDBC. You can query it using the included dsql command line tool:
$ bin/dsql dsql> SELECT page, SUM("count") AS Edits FROM wikiticker WHERE TIMESTAMP '2016-06-27 00:00:00' <= __time AND __time < TIMESTAMP '2016-06-28 00:00:00' GROUP BY page ORDER BY Edits DESC LIMIT 5; ┌──────────────────────────────────────────────────────────┬───────┐ │ page │ Edits │ ├──────────────────────────────────────────────────────────┼───────┤ │ Copa América Centenario │ 29 │ │ User:Cyde/List of candidates for speedy deletion/Subpage │ 16 │ │ Wikipedia:Administrators' noticeboard/Incidents │ 16 │ │ 2016 Wimbledon Championships – Men's Singles │ 15 │ │ Wikipedia:Administrator intervention against vandalism │ 15 │ └──────────────────────────────────────────────────────────┴───────┘ Retrieved 5 rows in 0.04s.
See the Druid SQL documentation for more details about making SQL queries with Druid.
Druid supports a rich family of JSON-based
queries. We've included an example topN query
quickstart/wikiticker-top-pages.json that will find the most-edited articles in this dataset:
curl -L -H'Content-Type: application/json' -XPOST --data-binary @quickstart/wikiticker-top-pages.json http://localhost:8082/druid/v2/
With Pivot, you explore a dataset by filtering and splitting it across any dimension. For each filtered split of your data, Pivot can show you the aggregate value of any of your measures. For example, on the wikiticker dataset, you can see the most frequently edited pages by splitting on "page" (drag "Page" to the "Split" bar) and sorting by "Edits" (this is the default sort; you can also click on any column to sort by it).
Pivot offers different visualizations based on how you split your data. If you split on a string column, you will generally see a table. If you split on time, you can see either a timeseries plot or a table.
Pivot can be customized through settings, as described on our Pivot configuration page.
There are many more query tools for Druid than we've included here, including other UIs, other SQL engines, and libraries for various languages like Python and Ruby. Please see the list of libraries at the Druid site for more!
So far, you've loaded a sample data file into an Imply installation running on a single machine. Next, you can: