Version:

Active Analytics Workbench (AAW) Overview

Kinetica provides the Active Analytics Workbench (AAW) with the goal of simplifying and accelerating data science and machine learning in a scalable fashion. With AAW, users can ingest data, train models, make inferences (answers/output from models), and even audit models with a few endpoints (or clicks via the UI). The AAW package can be automatically installed via KAgent and coexists with the database--meaning easy access to one's data and GPUs. AAW leverages Kubernetes to deploy, train, and test models.

Concepts

The AAW workflow is defined by four key concepts:

  • Data -- comprises ingests, datasets, and feature sets

    • Ingests represent an ingest tool (e.g., KIO, Kafka, etc.) that pulls data from a given source (Kinetica, PostgreSQL, S3, etc.) and puts it in a new collection/table inside Kinetica. Data can be pulled in batches or via continuous streaming.
    • Datasets represent a set (or sub-set) of column data from a source table in Kinetica. One or more columns can be filtered to create a dataset.
    • Feature sets represent a group of features, which are columns from a dataset that have either been transformed by a function (Log, Categorical, etc.) or manipulated by a Lambda function
  • Models -- functions, statistical models, regressions, data models, and more that are deployed to enable inferencing capabilities. AAW deploy any number of replicas of the model, allowing for scalability and better resource management. There are three types of supported models:

    • Tensorflow models, which are written using the Tensorflow library. Tensorflow models require a template (a model created previously or a template provided by Kinetica), a feature set, a training dataset, a testing dataset, and training parameters/values.

    • RAPIDS models, which are written using the NVIDIA RAPIDS library. RAPIDS models require a template, a dataset, dataset features, and dataset labels and training percentages.

    • Blackbox models, which are models where implementation details are abstracted and housed in Docker containers. Input and output are the only available interface; they also don't require a training dataset.

      Important

      Blackbox models rely on the Kinetica Blackbox SDK to fetch AAW-compatible output from a custom Blackbox. Kinetica can assist in Blackbox model container creation if necessary. Consult Import Blackbox Model for more information.

  • Deployments represent all models that have been deployed. Deployed models can have inference tests run manually, automatically, or in batches depending on the type of deployment. Currently there are three types of deployments:

    • On-Demand -- inference tests are run as necessary using user-provided input with results being returned based on the given input
    • Continuous -- inference tests run automatically against records being streamed into an input table; inference results are inserted into an output table
    • Batch -- inference tests are run against a batch of data in an existing table all at once
  • Audits represent the ability to audit a model deployment to ensure its training, testing, and inferencing are untampered. Audits are created from keywords and a date range; once created, audits can drill into specific inferences from a deployment and filter the inference by input parameter, process status, and more.

Installation

AAW can be installed via KAgent or manually. We recommend installing AAW using KAgent, as KAgent can install everything AAW requires and can preconfigure everything for the best out-of-the-box experience.

KAgent AAW Installation

Consult Cluster for AAW package install instructions using KAgent. If Kubernetes is already installed, a local Kubernetes configuration file can be uploaded when setting up Kinetica / AAW using KAgent. If Kubernetes is not installed, a basic configuration file can be created and uploaded by KAgent after automatically installing Kubernetes. AAW will be enabled automatically if installed using KAgent. After the installation is finished, a kml service is available to manage the AAW user interface and API.

If Kinetica was installed using KAgent but AAW was not installed during the same installation, the easiest way to install AAW is by deleting the appropriate cluster via the KAgent cluster interface. Because deleting a cluster in KAgent does not uninstall Kinetica, adding AAW is as simple as adding the cluster again via KAgent while ensuring the AAW package is selected for install. Consult Kinetica Installation with KAgent (On Premise) for more information.

Manual AAW Install (No HTTPD/SSL)

If installing AAW without KAgent is required, it is possible to manually install the AAW package and enable it as long as Kinetica is already installed and running. These instructions assume the Kinetica cluster does not have HTTPD or SSL enabled/configured.

Important

AAW relies on Kubernetes for ingesting data, model deployment and inferencing, and more. Kubernetes must be installed prior to the AAW package installation.

  1. Ensure Kinetica is installed and running and text search capabilities are enabled. Consult Manual Kinetica Installation for more information on manual Kinetica installation and enabling text search.

  2. Deploy the AAW package using the standard procedures for a local package. We recommend deploying the package on the head node (if there are resources for it).

    • On RHEL

      sudo yum install ./kinetica-ml-<version>-<release>.<architecture>.rpm
      
    • On Debian/Ubuntu

      sudo apt install ./kinetica-ml-<version>-<release>.<architecture>.deb
      
  3. Copy the .kube/config file from the Kubernetes cluster (typically located at /etc/kubernetes/admin.conf) to the ~gpudb/.kube/ directory on the Kinetica head node:

    scp /etc/kubernetes/.kube/config <user>@<kinetica-head-node-ip>:~gpudb/.kube/
    

    Note

    The server value in this file may need to be updated to reflect the IP address of the Kubernetes cluster.

  4. Change the .kube/config file's ownership to the gpudb user:

    chown gpudb:gpudb config
    
  5. Open /opt/gpudb/kml/etc/kml.ini in an editor.

  6. Update the api_connection to reflect the correct IP address of the node hosting AAW. It must be an IP address and not a hostname:

    [api]
    api_connection=http://<aaw-node-ip>:9187
    
  7. Update the db_connection to reflect the correct IP address of the head node for Kinetica. It must be an IP address and not a hostname:

    [database]
    db_connection=http://<kinetica-head-node-ip>:9191
    
  8. Optionally, update the kube_config value to the directory location of the .kube/config file if it was not placed in ~gpudb/.kube/:

    kube_config=<custom-dir>/.kube/config
    
  9. Open /opt/gpudb/kml/etc/application.properties in an editor.

  10. Update the kinetica.api-url setting to reflect the correct IP address of the head node for Kinetica. It must be an IP address and not a hostname:

    kinetica.api-url=http://<kinetica-head-node-ip>:9191
    
  11. Update the kinetica.hostmanager-api-url setting to reflect the correct IP address of the head node for Kinetica. It must be an IP address and not a hostname:

    kinetica.hostmanager-api-url=http://<kinetica-head-node-ip>:9300
    
  12. Update the kinetica.api-url setting to reflect the correct IP address of the node hosting AAW. It must be an IP address and not a hostname:

    kinetica.kml-api-url=http://<aaw-node-ip>:9187/kml
    
  13. Restart the kml service as root:

    service kml restart
    

Important

AAW will inherit existing login information from the cluster, so logging into the AAW user interface will use the same credentials as GAdmin

Logging

All the main logs for the AAW service and API are located in /opt/gpudb/kml/logs. All logs for the AAW user interface are located in /opt/gpudb/kml/ui/logs.