Version:

Installation

Kinetica installation and configuration instructions.

System Requirements

Operating system, hardware, and network requirements to run Kinetica.

Certified OS List

CPU Platform Linux Distribution Versions
x86 RHEL 6.x, 7.x
x86 Centos 6.x, 7.x
x86 Ubuntu 14.x LTS , 16.x LTS
x86 SUSE 12, 12 SP1, 12 SP2
x86 Debian 8.x
ppc64le RHEL 7.2
ppc64le Centos 6.x, 7.x
ppc64le Ubuntu 14.04 LTS , 16.x LTS

Minimum Operating System Requirements

Kinetica runs on the following 32 or 64-bit Linux-based operating systems.

OS Supported Versions
Amazon AMI 2012
CentOS 6, 7
Fedora 20+
RedHat 6, 7
SUSE Linux Enterprise 12+
Ubuntu 12+

Minimum Hardware Requirements

Component Specification
CPU Two socket based server with at least 8 cores Intel x86-64, Power PC 8le, or ARM processor
GPU Nvidia K40, K80, M60, P100, and V100
Memory Minimum 8GB
Hard Drive SSD or SATA 7200RPM hard drive with 4X memory capacity

GPU Driver Matrix

GPU Driver CUDA
750ti 387.34 (anything lower than 390) 8
K20/K40/K80 387.34 (anything lower than 390) 8
M6/M60 387.34 (anything lower than 390) 8
P4/P40/P100 390.46 (or anything above 390) 9.1
V100 390.46 (or anything above 390) 9.1

Capacity Planning

It's recommended that swap space equal to 25-50% of the available memory of a machine is available to avoid disk spilling and out-of-memory issues.

Check if there is active swap using the command:

free -h
              total        used        free      shared  buff/cache   available
Mem:            15G        2.0G        5.3G         32M        8.2G         13G
Swap:          6.5G          0B        6.5G

To create a new swap file:

  1. As root, run the dd command with bs set to the desired read and write limit in bytes (usually 1024) and count set to the desired file size in megabytes:

    sudo dd if=/dev/zero of=</swapfile/path> bs=1024 count=<file-size>
    
  2. Make the file only accessible to root

    sudo chmod 600 </swapfile/path>
    
  3. Mark the file as swap space:

    sudo mkswap </swapfile/path>
    
  4. Enable the swap file:

    sudo swapon </swapfile/path>
    
  5. Backup the /etc/fstab file and then add the swap file to it to make the swap file permanent:

    sudo cp /etc/fstab /etc/fstab.bak
    echo `</swapfile/path> none swap sw 0 0` | sudo tee -a /etc/fstab
    

Prepare Your Cluster

There are some steps that should be followed to set up your network and server configuration before installing Kinetica.

The first step is to collect the IP addresses of the server or servers that will be running Kinetica. If deploying to a cluster, one server must be designated as the head node. This server receives user requests and parcels them out to the other worker nodes of the system. The head node of the cluster (or only node in a single-node system) will also be used for administration of the system, host all services & applications, and as such, will require special handling during the installation process.

Networking Configuration

The Kinetica head node will require a number of ports to be open in order to communicate with its applications & services.

Any worker nodes will need ports opened to communicate with the head node and each other, though this set of ports will be smaller than that of the head node.

Default Ports

The default ports used for communication with Kinetica (and between servers, if operating in a cluster) follow. The Nodes column will list either Head--that the corresponding port only needs to be opened on the head node, or All--that the corresponding port needs to be opened on the head node & worker nodes.

Port Function Nodes Usage
2003 This port must be open to collect the runtime system statistics. Head Required Internally
4000+N For installations which have the external text search server enabled and communicating over TCP (rankN.text_index_address = tcp://…), there will be one instance of the text search server listening for each rank on every server in the cluster. Each of these daemons will be listening on a port starting at 4000 on each server and incrementing by one for each additional rank. All Optional Internally
5552 Host Manager status notification channel All Required Internally
5553 Host Manager message publishing channel All Required Internally
6555+N Provides distributed processing of communications between the network and different ranks used in Kinetica. There is one port for each rank running on each server, starting on each server at port 6555 and incrementing by one for each additional rank. All Required Internally
8080 The Tomcat listener for the Kinetica Administration Application (GAdmin) Head Optional Externally
8082 In installations where users need to be authenticated to access the database, a preconfigured HTTPd instance listens on this port, which will authenticate incoming HTTP requests before passing them along to Kinetica. When authorization is required, all requests to Kinetica should be sent here, rather than the standard 9191+ ports. All Optional Externally
8088 This is the port on which Kinetica Reveal is exposed. For installations which have this feature enabled, it should be exposed to users. Head Optional Externally
8181 This is the port used to host the system and process stats server Head Optional Externally
9001 Database trigger ZMQ publishing server port. Users of database triggers will need the ability to connect to this port to receive data generated via the trigger. Head Optional Externally
9002 Table monitor publishing server port. Users of database table monitors will need the ability to connect to this port to receive data generated via the table monitor. Head Optional Externally
9191+N The primary port(s) used for public and internal Kinetica communications. There is one port for each rank running on each server, starting on each server at port 9191 and incrementing by one for each additional rank. These should be exposed for any system using the Kinetica APIs without authorization and must be exposed between all servers in the cluster. For installations where users should be authenticated, these ports should NOT be exposed publicly, but still should be exposed between servers within the cluster. All Required Internally, Optional Externally
9292 Port on which the ODBC Server listens for connections Head Optional Externally
9300 Port used to query Host Manager for status All Required Internally

Port Usage Scenarios

Kinetica highly encourages that proper firewalls be maintained and used to protect the database and the network at large. A full tutorial on how to properly set up a firewall is beyond the scope of this document, but the following are some best practices and starting points for more research.

All machines connected to the Internet at large should be protected from intrusion. As shown in the list above, there are no ports which are necessarily required to be accessible from outside of a trusted network, so we recommend only opening ports to the Internet and/or untrusted network(s) which are truly needed based on requirements.

There are some common scenarios which can act as guidelines on which ports should be available.

Connection to the Internet

If Kinetica is running on a server where it will be accessible to the Internet at large, it is our strong suggestion that security and authentication be used and ports 9191+N and 8080 are NOT exposed to the public, if possible. Those ports can potentially allow users to run commands anonymously and unless security is configured to prevent it, any users connecting to them will have full control of the database.

Dependence on Kinetica via the API

For applications in which requests are being made to Kinetica via client APIs that do not use authentication, the 9191+N ports should be made available to the relevant set of servers. For applications using authentication via the bundled version of httpd, port 8082 should be opened. It is possible to have both ports open at the same time in cases where anonymous access is permitted, however the security settings should be carefully set in this case to ensure that anonymous users have the appropriate access limitations.

Additionally, if the API client is using table monitors or triggers, ports 9001 and/or 9002 should also be opened as needed.

Reveal

In cases where the GUI interface to Reveal is required, the 8088 port should be made available.

Administration

System administrators may wish to have access to the administrative web interface, in which case port 8080 should be opened, but carefully controlled.

Adjusting the Firewall

RHEL 6

RHEL 6 uses iptables by default to configure its firewall settings. These can be updated using the /etc/sysconfig/iptables file, or, if you have X Server running, there is also a GUI for editing the firewall that can be run using the command:

system-config-firewall
RHEL 7

RHEL 7 continues to use iptables under the hood, but the preferred way to interact with iptables was updated to using the firewall-cmd command or firewall-config GUI. For example, the following commands will open up port 8082 publicly:

firewall-cmd --zone=public --add-port=8082/tcp --permanent
firewall-cmd --reload
Ubuntu 12 & Debian 8.x (Jessie)

Ubuntu 12 uses iptables by default to configure its firewall settings. These can be updated using the /etc/sysconfig/iptables file, or you can use the iptables command:

sudo iptables -A INPUT -p tcp --dport 8181 -j ACCEPT
sudo iptables-save
Ubuntu 14 & 16

Ubuntu 14 & 16 come with a ufw (Uncomplicated FireWall) command, which controls the firewall, for example:

sudo ufw allow 8181

Node Configuration

Each server in the Kinetica cluster should be properly prepared before installing Kinetica.

Install Nvidia Drivers

If Nvidia GPUs are present in the target servers, but the drivers have not been installed yet, they should be installed now. See either Install Nvidia Drivers on RHEL or Install Nvidia Drivers on Debian/Ubuntu for details.

Install Kinetica

Installation of Kinetica involves the deployment of the installation package, and either a browser-based or console-driven initialization step. Afterwards, passwordless SSH should be configured for ease of management of the system.

The installation process also requires a license key. To receive a license key, contact support at support@kinetica.com.

Deploy

The Kinetica application needs to be deployed to all servers in the target cluster. Deploy the package using the standard procedures for a local package.

  • On RHEL:

    sudo yum install ./gpudb-<gpuhardware>-<licensetype>-<version>-<release>.<architecture>.rpm
    
  • On Debian/Ubuntu:

    sudo apt install ./gpudb-<gpuhardware>-<licensetype>-<version>-<release>.<architecture>.deb
    

This installs the package to the directory /opt/gpudb, creates a group named gpudb, and two users (gpudb & gpudb_proc) whose home directory is located at /home/gpudb. SSH keys are also created to allow password-less SSH access between servers for the gpudb user when configured as a cluster. This will also register two services: gpudb & gpudb_host_manager.

Configure

Once the application has been deployed, choose the configuration method:

Visual Initialization

The Visual Installer is run through the Kinetica Administration Application (GAdmin) and simplifies the installation of Kinetica across a cluster.

Browse to the head node, using IP or host name:

http://localhost:8080/

Once you've arrived at the login page, you'll need to change your password and initialize the system using the following steps:

  1. Log into the admin application

    1. Enter Username: admin
    2. Enter Password: admin
    3. Click Login
  2. If a license key has not already been configured, a Product Activation page will be displayed, where the license key is to be entered:

    ../_images/product_activation.png
    1. Enter the license key under Enter License Key
    2. When complete, click Activate, then confirm the activation
  3. At the Setup Wizard page, configure the system basics:

    1. Enter the IP Address and number of GPUs (if any) for each server in the cluster
    2. Optionally, select the Public Head IP Address checkbox and update the address as necessary
    3. The license key under Configure License Key should already be populated
    4. When complete, click Save

    Important

    For additional configuration options, see the Configuration Reference.

  4. Start the system. This will start all Kinetica processes on the head node, and if in a clustered environment, the corresponding processes on the worker nodes.

    1. Click Admin on the left menu
    2. Click Start.
  5. Follow instructions here to update the administration account's password.

Skip ahead to Passwordless SSH.

Console Initialization

System configuration is done primarily through the configuration file /opt/gpudb/core/etc/gpudb.conf, and while all nodes in a cluster have this file, only the copy on the head node needs to be modified.

Important

Only edit the /opt/gpudb/core/etc/gpudb.conf on the head node. Editing the file on worker nodes is not supported and may lead to unexpected results.

  1. Log in to the head node and open /opt/gpudb/core/etc/gpudb.conf in an editor.

  2. Specify the head node IP address, the total number of database ranks, and the distribution of ranks across hosts. In this example, there are two servers with three ranks on the first and two ranks on the second:

    number_of_ranks = 5
    
    rank0.host = 192.168.0.100
    rank1.host = 192.168.0.100
    rank2.host = 192.168.0.100
    rank3.host = 192.168.0.101
    rank4.host = 192.168.0.101
    
    head_ip_address = 192.168.0.100
    
  3. For CUDA builds, the GPUs need to be assigned to ranks. To display the installed GPUs and their status run:

    nvidia-smi
    

    If the program is not installed or doesn't run, see Install Nvidia Drivers.

    Once the number of GPUs on each server has been established, enter them into the configuration file by associated rank. In this example, there are two servers with a GPU assigned to each of two ranks per host (none for rank0):

    rank0.gpu = 0 # This GPU can be shared with a worker rank, typically rank 1.
    
    rank1.taskcalc_gpu = 0
    rank2.taskcalc_gpu = 1
    rank3.taskcalc_gpu = 0 # On new host, restart at 0
    rank4.taskcalc_gpu = 1
    
  4. For non-CUDA builds, the Numa CPUs need to be assigned to ranks. To display the Numa nodes, run:

    numactl -H
    

    Once the number of Numa nodes on each server has been established, enter them into the configuration file by associated rank. In this example, there are two servers with a Numa node assigned to each of two ranks per host (none for rank0):

    rank0.numa_node =        # Preferring a node for the head node HTTP server is often not necessary.
    
    rank1.base_numa_node = 0
    rank2.base_numa_node = 1
    rank3.base_numa_node = 0 # On new host, restart at 0
    rank4.base_numa_node = 1
    
    rank1.data_numa_node = 0
    rank2.data_numa_node = 1
    rank3.data_numa_node = 0 # On new host, restart at 0
    rank4.data_numa_node = 1
    
  5. Determine the directory in which database files will be stored. It should meet the following criteria:

    • Available disk space that is 4x memory
    • Writable by the gpudb user
    • Consist of raided SSDs
    • Not be part of a network share or NFS mount
  6. Enter the database file directory path into the configuration:

    persist_directory = /opt/gpudb/persist
    
  7. Set the license key:

    license_key = ...
    

Important

For additional configuration options, see the Configuration Reference.

To bring up the system, start the gpudb service:

service gpudb start

This will start all Kinetica processes on the head node, and if in a clustered environment, processes on the worker nodes.

Passwordless SSH

If Kinetica is installed in a clustered environment, configuring passwordless SSH will make management considerably easier. Run the following command on the head node to set up passwordless SSH between the head node and the worker nodes for the gpudb users created during deployment:

sudo /opt/gpudb/core/bin/gpudb_hosts_ssh_copy_id.sh

Validate the Installation

To validate that Kinetica has been installed and started properly, you can perform the following tests.

Curl Test

To ensure that Kinetica has started (you may have to wait a moment while the system initializes), you can run curl on the head node to check if the server is responding and port is available with respect to any running firewalls:

$ curl localhost:9191
Kinetica is running!

API Test

You can also run a test to ensure that the API is responding properly. There is an admin simulator project in Python provided with the Python API, which pulls statistics from the Kinetica instance. Running this on the head node, you should see:

$ python /opt/gpudb/api/python/gpudb/gadmin_sim.py
**********************
Total tables:              0
Total top-level tables:    0
Total collections:         0
Total number of elements:  0
Total number of objects:   0

GAdmin Status Check

The administrative interface itself can be used to validate that the system is functioning properly. Simply log into GAdmin. Browse to Dashboard to view the status of the overall system and Ranks to view the status breakdown by rank.

Core Utilities

Kinetica contains many helpful core utilities and scripts that can be found in /opt/gpudb/core/bin/. Note that any of the gpudb_hosts_*.sh scripts will operate on the hosts specified in gpudb.conf and the number_of_ranks param as parsed by gpudb_hosts_addresses.sh.

Utility / Script Description
gpudb Run as gpudb user or root. The Kinetica system start/restart/stop/status script
gpudb_accounts.py Run using the gpudb_env.sh version of Python. Allows you to manage Kinetica users' account information
gpudb_cluster_cuda Server executable for CUDA clusters. Displays version and configuration information. This should only be run by the gpudb script (see above)
gpudb_cluster_intel Server executable for Intel clusters. Displays version and configuration information. This should only be run by the gpudb script (see above)
gpudb_conf_parser.py Run using the gpudb_env.sh version of Python. Utility for parsing .ini files for scripts
gpudb_env.sh Utility to run a program and its given arguments after setting the PATH, LD_LIBRARY_PATH, PYTHON_PATH, and others to the appropriate /opt/gpudb/ directories. Use this script to correctly setup the environment to run Kinetica's packaged Python version. You can also run source /opt/gpudb/core/bin/gpudb_env.sh to have the current environment updated
gpudb_host_manager The host daemon process that starts and manages any Kinetica processes
gpudb_host_setup.sh Run as root. This script will set the OS configuration to an optimal state for Kinetica
gpudb_hosts_addresses Prints all the unique hostnames (or IPs) specified in gpudb.conf
gpudb_hosts_diff_file.sh Run as gpudb user or root. Utility to diff a given file from the current machine to the specified destination file on one or more hosts
gpudb_hosts_logfile_cleanup.sh Run as gpudb user or root. Script to delete old log files and optionally keep the last n logs
gpudb_hosts_persist_clear.sh

Run as gpudb user or root. Script to clear the database persist files (location specified in gpudb.conf)

Important: Only run this while the database is stopped

gpudb_hosts_persist_init_encryption.sh Run as gpudb user. Clear the persist directories (specified in gpudb.conf) and initialize them to be encrypted
gpudb_hosts_persist_mount_encryption.sh Run as gpudb user. Script to mount the already-initialized, encrypted persist directories (specified in gpudb.conf). If an encrypted persist directory is detected and the gpudb.conf parameter persist_encryption_pass_command is valid the gpudb script (see above) will automatically mount the persist directory using this command if it was not mounted already
gpudb_hosts_persist_umount_encryption.sh Run as gpudb user. Script to unmount the already-mounted, encrpyted persist directories (specified in gpudb.conf). If the gpudb.conf parameter persist_encryption_pass_command is valid, the persist directories will be unmounted by the gpudb script (see above) when the database has stopped
gpudb_hosts_rsync_to.sh Run as gpudb user. Script to copy files from this server to the remove servers using rsync
gpudb_hosts_ssh_copy_id.sh

Run as root. Script to to distribute the gpudb user's public SSH keys to the other hosts defined in gpudb.conf to allow password-less SSH. This script should only be run from the head node

Important: This script should be re-run after changing the host configuration to redistribute the keys

gpudb_hosts_ssh_execute.sh Run as gpudb user or root. Script to execute a program with arguments on all hosts specified in gpudb.conf, e.g., ./gpudb_hosts_ssh_execute.sh "ps aux"
gpudb_keygen Program to generate and print a machine key. You can use the key to obtain a license from support@kinetica.com
gpudb_logger.sh Rolling logger utility to help manage the size and number of logs available
gpudb_machine_info.sh Script to print OS config information that affects performance as well as suggestions to improve performance
gpudb_nvidia_setup.sh Utility to configure the Nvidia GPU devices for best performance or restore defaults. Root permission is required to change values. Utility reports informational settings and permission errors when run as user
gpudb_open_files.sh Script to print the files currently open by the database
gpudb_sysinfo.sh More information when run as root. Script to print a variety of information about the system and hardware for debugging. You can also make a .tgz file of the output. Rerun this program as needed to keep records of the system. Use a visual diff program to compare two or more system catalogs
gpudb_udf_distribute_thirdparty.sh Utility to mirror the local /opt/gpudb/udf/thirdparty to remote hosts. Creates a dated backup on the remote host before copying
gpudb_useradd.sh Script to create the gpudb:gpudb and gpudb_proc:gpudb_proc user:groups and SSH id. This script can be rerun as needed to restore the user:groups and ssh config. Be sure to rerun (on the head node only) gpudb_hosts_ssh_copy_id.sh to redistribute the SSH keys if desired whenever the SSH keys are changed

Troubleshooting

Error Logging

The log file located at /opt/gpudb/core/logs/gpudb.log should be the first place to check for any system errors. Any issues which would prevent successful start-up of Kinetica will be logged as ERROR in the log. Consequently, running the following command will return enough information to provide a good starting point for further investigation:

grep ERROR /opt/gpudb/core/logs/gpudb.log | head -n 10

Uninstallation

Should you need to uninstall Kinetica, you'll need to shut down the system, remove the package, and remove related files, directories, & user accounts.

  1. Stop the system

  2. Remove the package from your machine

    • On RHEL:

      sudo yum remove gpudb-<gpuhardware>-<licensetype>.<architecture>
      
    • On Debian-based:

      sudo dpkg -r gpudb-<gpuhardware>-<licensetype>.<architecture>
      
  3. Remove any user-defined persist directories (these directories are set in /opt/gpudb/core/etc/gpudb.conf)

  4. Clean-up all Kinetica artifacts (for both RHEL and Debian-based):

    sudo rm -rf /opt/gpudb
    
  5. Remove the gpudb & gpudb_proc users from the machine

    • On RHEL:

      sudo userdel -r gpudb
      sudo userdel -r gpudb_proc
      
    • On Debian-based:

      sudo deluser --remove-home gpudb
      sudo deluser --remove-home gpudb_proc
      
  6. Remove the gpudb group from the machine:

    groupdel gpudb