Build Smarter Business Apps with a Modern GPU Database

Dipti Borkar
January 19, 2018

Infusing business analytics based apps with AI isn’t easy. Today the world of AI is quite disconnected from the world of business analytics today. To converge the two, you need to master the entire AI pipeline, from data wrangling to modeling to actually operationalizing the models in order to deploy them within the application to derive deeper insights.

Why Use a GPU Database?

Why is there suddenly so much interest in AI, machine learning, and deep learning? It’s due to the union of big data, more affordable memory, and the growing popularity of GPUs.

GPUs are designed around thousands of small, efficient cores that are well-suited to performing repeated similar instructions in parallel, making them ideal for the compute-intensive workloads required of large and streaming datasets. A modern GPU database coupled with advanced in-database processing functions as a single hardware converged platform with a simplified software stack that can leverage GPUs and CPUs for handling traditional analytics, big data, emerging AI/ML/deep learning, and real-time data ingestion. It delivers the ease-of-use, scale, and speed to deploy deep learning models and libraries such as TensorFlow, Caffe, and Torch pervasively across the enterprise—and allows you to converge AI with BI and more quickly deliver results.

Typical AI Process and Infrastructure

The typical AI process for deploying a smart business application involves capturing the data, identifying the algorithms, training the algorithms on the data, and deploying the products of that training to your applications. Often, 70-80% of the time is spent on preparing, massaging, and manipulating the data.

The downsides to these types of existing AI infrastructures are many:

High latency – long time to prepare data, train models, operationalize models
Multiple systems are stitched together– databases, data science/analytics tools, visualizations, apps
Rigid – They can’t handle changing requirements, changing data, not iterative and interactive
Model management is complex, given multiple versions and data sets
High complexity requires admin overhead, resources, skills to be able to stitch these components together
Difficult to repeat
Data scientists struggle to parallelize algorithms

With these types of infrastructures, model pre- and post-processing can take hours to days to complete, whereas with a GPU-accelerated in-memory database, processing ranges from milliseconds to minutes.

Kinetica: A More Ideal AI Process

At Kinetica, we improved the AI process by incorporating those capabilities into a single in-memory, GPU-accelerated database platform, featuring:

Advanced analytics
AI/ML/Deep Learning
In-memory SQL
Integration with CPUs/GPUs
Matrix and vector processing
A complete AI pipeline built into a single database, including data preparation, model training, and model operationalization
Pluggable UDF framework
TensorFlow is bundled with the database
Relational database with SQL-92, ODBC/JDBC, APIs
Ability to handle streams from Storm, Spark, Flume or other tools with no degradation in terms of supporting the analytics and running the AI model pipelines

By operationalizing the AI processes, Kinetica’s GPU database platform takes most of the grunt work needed to build the systems out of the data scientists’ hands, so they can concentrate on building algorithms that will enable their organizations to take advantage of AI and machine learning—in an order of magnitude less time, using far less infrastructure.

If you’d like to learn more, read the O’Reilly book Introduction to GPUs for Data Analytics, which provides an overview of how advances in high-performance computing technology are addressing current and future database and big data analytics challenges.