ML-Powered Analytics
With Kinetica, you entire machine learning workflow can start and end in one place: the database. The Active Analytics Workbench (AAW) allows users to ingest data into Kinetica, create a model that sits on top of or uses that data, make inferences on the model, and then audit the inferences.
Important
If you chose the single node installation path (the Kinetica Docker container) in Download and Install Kinetica, you will not have access to the Active Analytics Workbench.
In this tutorial, we’re going to use the NYC taxi dataset to predict fare prices based on the distance traveled between pickup and dropoff points. We’ll use the AAW user interface to get our model up and running.
Importing a Model
A model is a mathematical or programmatical representation of a real-world process. AAW currently supports three types of models: TensorFlow, Blackbox, and RAPIDS.
- TensorFlow models are written using the TensorFlow libraries and require you to define a featureset and training and test datasets.
-
Blackbox models can theoretically be written using any library or package as long as the model meets these requirements:
- it is in a Python file
- it complies with the Kinetica Blackbox SDK
- it is published to Docker
- RAPIDS models are written using NVIDIA’s suite of software libraries built on CUDA-X AI. Consult the RAPIDS developer’s guide for more information.
Let’s create two separate models in AAW: one for our on-demand deployments and one for our continuous deployments.
On-Demand
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Models + Analytics.
- Click + Add Model, then click New Blackbox.
- Under Create a Blackbox Model Manually, click Create.
- Input
Taxi Fare Predictor - On-Demand
for the Model Name. - Input
Blackbox model for on-demand deployments
for the Model Description. - Input
kinetica/kinetica-blackbox-quickstart:7.1.0
for the Docker Container URL. - Input
bb_module_default
for the Module. - Input
predict_taxi_fare
for the Function. - For the Input Columns, click Add until there are four (4) columns.
- Define the following four (4) input columns:
pickup_longitude
—float
pickup_latitude
—float
dropoff_longitude
—float
dropoff_latitude
—float
- For the Output Columns, define the following one (1) output column:
fare_amount
—double
- Click Create.
Continuous
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Models + Analytics.
- Click + Add Model, then click New Blackbox.
- Under Create a Blackbox Model Manually, click Create.
- Input
Taxi Fare Predictor - Continuous
for the Model Name. - Input
Blackbox model for continuous deployments
for the Model Description. - Input
kinetica/kinetica-blackbox-quickstart:7.1.0
for the Docker Container URL. - Input
bb_module_default
for the Module. - Input
predict_taxi_fare
for the Function. - For the Input Columns, click Add until there are four (4) columns.
- Define the following four (4) input columns:
pickup_longitude
—float
pickup_latitude
—float
dropoff_longitude
—float
dropoff_latitude
—float
- For the Output Columns, define the following one (1) output column:
fare_amount
—double
- Click Create.
Deploying a Model
To deploy a model is to make it active; if a model is deployed, inferences can be made against it. AAW currently supports three types of deployments: Batch, Continuous, and On Demand.
- Batch deployments can be used to make inferences on a batch of data at once.
- Continuous deployments utilize table monitors to convert streaming incoming data into continuous inferencing output.
- On Demand deployments can be used to inference manually and as necessary with user input.
Let’s create on-demand and continuous deployments for our Blackbox model.
Important
Before the models are deployed, the taxi_data_streaming table must be in the database. Review the Importing Data tutorial for more information.
On-Demand
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Models + Analytics.
- Select the Taxi Fare Predictor – On-Demand model and click Deploy in the right-hand menu.
- Input
fare_predict_deploy_on_demand
for the Name. - Input
Taxi fare predictor model On-Demand deployment
for the Description. - For simplicity’s sake, select CPU for the Compute Target.
- Leave the deployment mode as On-Demand, leave the # of Replicas set to
1
, and leave the Environment Variables blank. - Click Deploy.
Continuous
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Models + Analytics.
- Select the Taxi Fare Predictor – Continuous model and click Deploy in the right-hand menu.
- Input
fare_predict_deploy_continuous
for the Name. - Input
Taxi fare predictor model Continuous deployment
for the Description. - For simplicity’s sake, select CPU for the Compute Target.
- Select Continuous for the deployment mode. Leave the # of Replicas set to
1
and leave the Environment Variables blank. - For the Source Table, select taxi_data_streaming.
- For the Output Table, input
fare_prediction_continuous_inference
. - Click Deploy.
Making Inferences
Inferences are what prove the utility of a model. In the previous section, we created two different deployments, one of which will automatically make inferences as data comes in and output those inferences to tables in Kinetica. Let’s infer the fare for a trip from JFK to a hotel in the middle of Manhattan.
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Deployments.
- Select the fare_predict_deploy_on_demand deployment and click Test Inference in the right-hand menu.
- In the Test Inference window, provide the following for the input columns:
- pickup_longitude —
-73.87544
- pickup_latitude —
40.77377
- dropoff_longitude —
-73.98488
- dropoff_latitude —
40.76851
- pickup_longitude —
- Click Run Inference. The inputs you provided will be passed to the model and the model will infer the fare_amount and return the value to the window.
- When you are finished, close the window.
On-demand deployments run indefinitely. Any time you want to make additional inferences against the on-demand deployment, you can repeat the steps above.
Don’t forget to take a look at the fare_prediction_continuous_inference
table in GAdmin (http://<kinetica-host>:8080
) to see the continuous inferencing at work!
Auditing a Model
AAW’s auditing functionality is arguably its most important, allowing you to audit every inference made against your models. Each inference is tracked with detailed information, including model type, the module and function, the Docker image used, the inferenced amount, as well as the source values.
We’re going to audit the fare amount inferenced for the manual inference we made earlier.
- Navigate to AAW (
http://<aaw-host>:8070
). - Click Audits.
- Input
Fare Amount Audit
for the Audit Name. - Input
Testing model fare amount inferences against historical data fare amounts
for the Description. - Input
on-demand
for the Keywords. - Leave the Time Range blank and click Search. The Audit Results will be returned.
- Select Taxi Fare Predictor – On-Demand. The Audit Inferences will be displayed.
- Select the inference in the table.
The Audit Graph will be displayed, allowing you to see how the value was inferenced.
At this point, you can compare the inferenced value to the actual recorded value in the taxi_data_streamingta
table using the SQL tool in GAdmin:
- Navigate to GAdmin (
http://<kinetica-host>:8080/
) - Click Query > SQL.
SELECT pickup_longitude, pickup_latitude, fare_amount FROM taxi_data_streaming WHERE pickup_longitude=-73.87544250488281 AND pickup_latitude=40.77376937866211;
Output:
+--------------------+-------------------+---------------+ | pickup_longitude | pickup_latitude | fare_amount | +--------------------+-------------------+---------------+ | -73.875443 | 40.773769 | 33.0 | +--------------------+-------------------+---------------+