Imply lookups for enhanced network flow visibility

by Eric Graham · November 26, 2018

Modern day TCP/IP networks continue to evolve making management and monitoring ever harder. In addition, stricter SLAs for uptime, mission critical enterprise applications and the need for fast MTTR makes ultra-fast databases and easy to use UIs critical for business success. Modern day tools need to be flexible, not just supporting basic network flow for IP visibility, but by providing visibility into hostnames, microservice names, usernames and more. Imply and Druid were designed from the ground up to help solve these exact problems.

cover

The founders of Imply built one of the most popular open source databases available today for operational analytics, Druid. Within Druid there are multiple ways to enhance visibility for existing network flow records. This how-to blog covers one way to do this using Druid lookup tables. You can think of lookup tables as a secondary fact table that you can use to query based on a key value.

If I want to define hostname, which is not included in standard network flow records, I could use a lookup table to join the two tables based on IP address and assign my latest hostname to a dimension in Pivot, the Imply UI. This works by providing a join at query time mapping IP to a hostname (name coming from a secondary lookup table) by using IP as the common dimension between tables.

The following steps can be used to define a basic lookup table using a csv for input. This how-to assumes you have a data source loaded in Pivot and there is a dimension that matches your key value in the lookup table.

  1. Include druid-lookups-cached-global in your extension list for Druid. This can be defined in imply/conf/druid/_common/common.runtime.properties for the following variable.

    druid.extensions.loadList=["druid-lookups-cached-global"]

  2. Create your csv input file using a format similar to the following (primary key,value key). The creation of this file could easily be automated using DNS or IPAM systems. Your primary table should already include the IPs you are mapping.

     192.168.1.30,my_mac_laptop
     192.168.1.29,dns_server
     192.168.1.28,dhcp_server
    
  3. You will need to create a json input file that imports the csv into your lookup table in Druid.

     {
       "__default": {
         "iana-ports": {
           "version": "v1",
           "lookupExtractorFactory": {
             "type": "cachedNamespace",
             "extractionNamespace": {
               "type": "uri",
               "uri": "file:/Users/egraham/Downloads/lookups/servports2.csv",
               "namespaceParseSpec": {
                 "format": "csv",
                 "columns": [
                  "key",
                  "value"
                 ]
               },
               "pollPeriod": "PT30S"
             },
             "firstCacheTimeout": 0
           }
         }
       }
     }
    
    • iana-ports defines the lookup table name.
    • version defines the update revision in Druid. When updates are made to the lookup table this should be incremented.
    • type under extractionNamespace defines how you will be importing mappings. In this case we are using “uri” to define a file location.
    • uri defines the location of the csv file you created in step 2 that includes your mappings.
    • format defines what file format you will be importing. In this case we created a csv.

    You should leave a csv header out for the individual columns.

    Columns can be left as key and value. pollPeriod defines how often Druid will poll the lookup table for updates.

    For more information see the documentation for lookups in general and the lookups-cached-global extension specifically.

  4. Load the lookup json file into the Druid coordinator

    curl -H "Content-Type: application/json" \
      --data @serv-port3.json \
      http://<your_coordinator_ip>:8081/druid/coordinator/v1/lookups/config
    
  5. Check to see that your lookup table was loaded into your broker. You may have to wait for your poll interval to expire before the table is updated on the broker.

    curl -X GET http://<your_broker_ip>:8082/druid/listen/v1/lookups

At this point, your lookup table should be ready to use in Pivot. To use the lookup table, you can create a new dimension that uses Plywood to query the lookup and primary table. The Plywood syntax looks something to the following.

$port_dst.lookup('iana-ports')

port_dst is the dimension name in my primary table and “iana-ports” is the name I used to define my lookup table in step 3 above.

New dimension

Save your dimension. Now you can use this dimension to extract the lookup table name against the primary key value. In the example below, I created multiple associations for IP to hostname and port to name mappings.

Table

Sankey

Note: Keep in mind that if you want to track changes to a mapped value over time, lookup tables are not the way to do it. A better way would be merge your two tables together during ingest using Kafka or some other stream processing database.

In summary, lookup tables are a great way to provide additional visibility at query time. Combined with Imply’s easy to use UI and Druid’s very fast response time, lookup tables are a truly powerful feature. Although not perfect for every use case, they are a great way to provide additional visibility in certain cases. Continue visiting our blog for future network flow related articles.

Back to blog

How can we help?