ML-Powered Analytics

With Kinetica, you entire machine learning workflow can start and end in one place: the database. The Active Analytics Workbench (AAW) allows users to ingest data into Kinetica, create a model that sits on top of or uses that data, make inferences on the model, and then audit the inferences.

In this tutorial, we’re going to use the NYC taxi dataset to predict fare prices based on the distance traveled between pickup and dropoff points. We’ll use the AAW user interface to get our model up and running.

Importing a Model

A model is a mathematical or programmatical representation of a real-world process. AAW currently supports three types of models: TensorFlow, Blackbox, and RAPIDS.

  • TensorFlow models are written using the TensorFlow libraries and require you to define a featureset and training and test datasets.
  • Blackbox models can theoretically be written using any library or package as long as the model meets these requirements:

  • RAPIDS models are written using NVIDIA’s suite of software libraries built on CUDA-X AI. Consult the RAPIDS developer’s guide for more information.

Let’s create two separate models in AAW: one for our on-demand deployments and one for our continuous deployments.

On-Demand

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Models + Analytics.
  3. Click + Add Model, then click Import Blackbox.
  4. Input Taxi Fare Predictor - On-Demand for the Model Name.
  5. Input Blackbox model for on-demand deployments for the Model Description.
  6. Input kinetica/kinetica-blackbox-quickstart:7.0.1 for the Docker Container URL.
  7. Input bb_module_default for the Module.
  8. Input predict_taxi_fare for the Function.
  9. For the Input Columns, click Add until there are four (4) columns.
  10. Define the following four (4) input columns:
    1. pickup_longitudefloat
    2. pickup_latitudefloat
    3. dropoff_longitudefloat
    4. dropoff_latitudefloat
  11. For the Output Columns, define the following one (1) output column:
    1. fare_amountdouble
  12. Click Create.

Continuous

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Models + Analytics.
  3. Click + Add Model, then click Import Blackbox.
  4. Input Taxi Fare Predictor - Continuous for the Model Name.
  5. Input Blackbox model for continuous deployments for the Model Description.
  6. Input kinetica/kinetica-blackbox-quickstart:7.0.1 for the Docker Container URL.
  7. Input bb_module_default for the Module.
  8. Input predict_taxi_fare for the Function.
  9. For the Input Columns, click Add until there are four (4) columns.
  10. Define the following four (4) input columns:
    1. pickup_longitudefloat
    2. pickup_latitudefloat
    3. dropoff_longitudefloat
    4. dropoff_latitudefloat
  11. For the Output Columns, define the following one (1) output column:
    1. fare_amountdouble
  12. Click Create.

Deploying a Model

To deploy a model is to make it active; if a model is deployed, inferences can be made against it. AAW currently supports three types of deployments: Batch, Continuous, and On Demand.

  • Batch deployments can be used to make inferences on a batch of data at once.
  • Continuous deployments utilize table monitors to convert streaming incoming data into continuous inferencing output.
  • On Demand deployments can be used to inference manually and as necessary with user input.

Let’s create on-demand and continuous deployments for our Blackbox model.

On-Demand

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Models + Analytics.
  3. Select the Taxi Fare Predictor – On-Demand model and click Deploy in the right-hand menu.
  4. Input fare_predict_deploy_on_demand for the Name.
  5. Input Taxi fare predictor model On-Demand deployment for the Description.
  6. The default deployment mode is On-Demand. Leave the # of Replicas set to 1.
  7. Click Deploy.

Continuous

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Models + Analytics.
  3. Select the Taxi Fare Predictor – Continuous model and click Deploy in the right-hand menu.
  4. Input fare_predict_deploy_continuous for the Name.
  5. Input Taxi fare predictor model Continuous deployment for the Description.
  6. Select Continuous for the deployment mode. Leave the # of Replicas set to 1.
  7. For the Source Table, select taxi_data.
  8. For the Output Table, input fare_prediction_continuous_inference.
  9. Click Deploy.

Making Inferences

Inferences are what prove the utility of a model. In the previous section, we created two different deployments, one of which will automatically make inferences as data comes in and output those inferences to tables in Kinetica. Let’s infer the fare for a trip from JFK to a hotel in the middle of Manhattan.

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Deployments.
  3. Select the fare_predict_deploy_on_demand deployment and click Test Inference in the right-hand menu.
  4. In the Test Inference window, provide the following for the input columns:
    1. pickup_longitude-73.87544
    2. pickup_latitude40.77377
    3. dropoff_longitude-73.98488
    4. dropoff_latitude40.76851
  5. Click Run Inference. The inputs you provided will be passed to the model and the model will infer the fare_amount and return the value to the window.
  6. When you are finished, close the window.

On-demand deployments run indefinitely. Any time you want to make additional inferences against the on-demand deployment, you can repeat the steps above.

Don’t forget to take a look at the fare_prediction_continuous_inference table in GAdmin (http://<kinetica-host>:8080) to see the continuous inferencing at work!

Auditing a Model

AAW’s auditing functionality is arguably its most important, allowing you to audit every inference made against your models. Each inference is tracked with detailed information, including model type, the module and function, the Docker image used, the inferenced amount, as well as the source values.

We’re going to audit the fare amount inferenced for the manual inference we made earlier.

  1. Navigate to AAW (http://<aaw-host>:8070).
  2. Click Audits.
  3. Input Fare Amount Audit for the Audit Name.
  4. Input Testing model fare amount inferences against historical data fare amounts for the Description.
  5. Input on-demand for the Keywords.
  6. Leave the Time Range blank and click Search. The Audit Results will be returned.
  7. Select Taxi Fare Predictor – On-Demand. The Audit Inferences will be displayed.
  8. Select the inference in the table.

The Audit Graph will be displayed, allowing you to see how the value was inferenced.

At this point, you can compare the inferenced value to the actual recorded value in the taxi_data table using the SQL tool in GAdmin:

  1. Navigate to GAdmin (http://<kinetica-host>:8080/)
  2. Click Query > SQL.
SELECT
    pickup_longitude,
    pickup_latitude,
    fare_amount
FROM taxi_data
WHERE
    pickup_longitude=-73.87544250488281 AND
    pickup_latitude=40.77376937866211;

Output:

+--------------------+-------------------+---------------+
|   pickup_longitude |   pickup_latitude |   fare_amount |
+--------------------+-------------------+---------------+
|         -73.875443 |         40.773769 |          33.0 |
+--------------------+-------------------+---------------+

Kinetica Trial Feedback