Operations

Service supervision and logging

Imply ships with a supervise command that manages service lifecycles and console logs. It is configured through a single configuration file per machine. Each machine can potentially start many services. For example, when you run the command:

bin/supervise -c conf/supervise/master-with-zk.conf

This tells supervise to use the file conf/supervise/master-with-zk.conf to select which services to run.

You can restart an individual service using its name. For example, to restart the zk service, run bin/service --restart zk from the distribution.

To shut down all services on a machine, kill the supervise process (CTRL-C or kill SUPERVISE_PID both work) or run the command bin/service --down from the distribution.

Logging

By default, logs are written to var/sv/<service>.log in the distribution. You can write these files to any location you want by passing the -d <directory> argument to bin/supervise.

For added convenience, you can also tail log files by running bin/service --tail <service>.

Log files are not automatically rotated. To prevent log files from growing forever, you can periodically truncate the logs using truncate -s 0 <logfile>.

Customizing supervision

You can modify the provided supervision files or create new files of your own. There are two kinds of lines in supervision files:

  • :verify some-program will run some-program on startup. If the program exits successfully, supervise will continue. Otherwise, supervise will exit.

  • foo some-program will supervise a service named foo by running the program some-program. If the program exits, supervise will start it back up. Its console logs will be logged to a file named foo.log.

Updating a cluster

When updating an Imply cluster, you should follow the typical procedure for a Druid Rolling Update. If you have deployed your cluster with the Master, Query, and Data server configuration, then take note of the following:

  • Update Data servers first, then Query servers, then Master servers.
  • Your Data servers run Druid MiddleManagers as part of Druid's Indexing Service. If you have indexing tasks that you do not want to be interrupted by a rolling update, you can use the Indexing Service - Without Autoscaling method to prepare the MiddleManagers for clean restart.
  • Your Data servers run Druid Historical Nodes, so you should wait for each server to fully come back online before restarting the next.

Druid operations

Please see the Druid operations documentation for tips on best practices, extension usage, monitoring suggestions, multitenancy information, performance optimization, and many more topics.

How can we help?