The AI Database – A Prerequisite to Operationalizing Machine and Deep Learning

Andrew Wooler
October 4, 2017

These notes are from our recent webinar, “The AI Database – A Prerequisite to Operationalizing Machine and Deep Learning.” with guest Speaker, Mike Gualtieri, VP & Principal Analyst for Forrester, and Kinetica’s Mate Radalj, VP of Advanced Technology Solutions

The world of AI is changing life as we know it. From building smart cities to spotting the early signs of disease, AI is being used to transform pie-in-the-sky ideas into real results. Companies are not waiting for this technology to be perfected; in fact, the Deloitte Human Capital Trends 2017 shows that 38% of companies that participated in their research believe that robotics and automation will be “fully implemented” in their company within 5 years.

There are basically two types of AI in the world today: pure AI—which tries to mimic human intelligence in all aspects (think “The Matrix” or “Ex Machina”), and pragmatic AI — the technology that organizations are using today. Pragmatic AI is the technology behind powerful predictive models, computer vision, and natural language understanding; these are just a few of the applications that enterprises can apply to a broad number of use cases.

The good news is that you can probably implement AI in your organization faster than you think. Your data scientists and developers can learn new machine learning frameworks like TensorFlow on GPU systems, and can start infusing your existing applications with AI today. However, if you want to be able to support the full AI development lifecycle and be able to quickly create accurate models, you’ll need powerful and performant data management and processing capabilities—and that’s where GPU-accelerated databases come in.

The State of Artificial Intelligence

There are two types of AI: Pure AI and pragmatic AI. Pure AI refers to technology that strives to mimic all aspects of human intelligence. On the other hand, pragmatic AI is the technology that organizations are using today. Pragmatic AI is narrow in scope, but it can sometimes exceed human intelligence. For example, Google famously built an AI system to beat the Go champion, and IBM Watson built a system to beat the Jeopardy champion.

AI is not one technology—it consists of one or more building block technologies. These building block technologies include machine learning, deep learning, natural language generation, and knowledge engineering. Machine learning involves algorithms that analyze data to find models – models that can predict outcomes or understand context with significant accuracy and improve as more data is available. Deep learning can be thought of as a “branch” of machine learning; the difference is that a machine learning model has to be told how it should make an accurate prediction (by feeding it more data)—it’s supervised. On the other hand, a deep learning model is able to learn that on its own.

In the past, deep learning models were difficult to calculate, and it required a lot of compute power. In 2012, two significant things happened: 1) Scientists discovered shortcuts for training these models, and 2) They discovered that they could use GPUs to dramatically reduce the amount of time it takes to train a model.

GPUs have opened the door to new deep learning use cases. Deep learning can now provide uncanny accuracy on images, voice, and natural language. The math used to render graphics in real time maps nicely to the way deep learning models are trained. Now, deep learning can be applied to image processing, and can be more accurate, faster, and less prone to mistakes. Examples include applying deep learning to medical imaging to provide early detection of medical conditions, insurers can use deep learning to automatically assess damage and costs and train models to take over the adjusting process, automotive manufacturers can use deep learning to detect road conditions and obstacles, and IoT data can be used to predict machine failure in real time.

AI adoption is gaining momentum. According to the Forrester Data Global Business Technographics Data and Analytics Survey, there has been a dramatic increase in the percentage of organizations that plan to implement or expand artificial intelligence technologies within their companies. While there are many applications of AI, a popular application this past year has been the ability to liberate customer insights from the typical data silos. As AI technologies are assimilated into analytics applications, business users will benefit greatly from direct access to powerful insights that drive action.

Why Turn to AI?

Organizations are infusing machine learning and AI into their applications. They are turning to data scientists to explore data, formulate hypotheses, and use machine learning to find models that can be used to accomplish a wide variety of objectives, such as:

Detecting security intrusions
Image recognition, processing, and diagnostics
Resolving users’ technology problems
Automating production management work
Anticipating future customer purchases
Financial trading
Providing personalized medicine
Eliminating repetitive tasks

Challenges

The data science model building lifecycle is iterative and continuous. The life cycle can involve trying hundreds of different algorithms, adjusting a number of parameters, or taking a closer look at the actual data.

All data originates in real time. Enterprise applications such as a transaction occur in real time. Generally, however, traditional machine learning processes use data much, much later. The problem with that is AI models are perishable; they have to be re-trained, and new data needs to be introduced to it.

Legacy data integration technologies were not designed for AI. These technologies were designed for business intelligence reporting. The approach of using a data lake is sound, but they also still need to improve the iterative data science process.

Deep learning frameworks require massive parallel processing in order to train models. Fortunately, GPUs can be used to parallelize that processing. GPUs can efficiently execute programmer-coded commands as well as the massively parallel training of deep neural networks.

Enterprise data lakes are too slow for deep learning. Enterprise data lakes built upon Hadoop are essential to bring together the hundreds of sources of data that an enterprise may have and that may be relevant, but they’re too slow for deep learning.

The emergence of the AI database – Powered by GPUs

A new class of ‘AI database’ is emerging to solve many of these challenges. AI databases are designed to massively parallelize data and compute operations for deep learning model training and inferencing using GPUs. AI databases also bring along many of the benefits of a foundational data platform including:

An AI database can source data from multiple sources – dozens, hundreds, or thousands of internal and external sources of data
It can parallelize data and computational operations for deep learning. Latency could be introduced into the process if you can’t get data quickly enough to even be calculated.
The AI database provides a strong security architecture so that confidential information can be protected.
AI capable databases offer fault-tolerance, which may be less important during training, but typically highly important during inferencing.

Bringing AI workloads onto a GPU-accelerated AI database produces some significant benefits for the organization. Until now, training data was typically moved to specialized GPU systems, and deploying models for business users to take advantage was often riddled with complexity. But, what if you could run AI and BI workloads on one platform and deliver faster and better analytics? The emergence of AI databases enables you to do both.

Kinetica features a user-defined functions framework that makes these features of an AI database possible. With user-defined functions, custom code can be run directly on the data within the database. This code can take advantage of parallel compute on the GPU and the database is also able to handle distributed compute over multiple machines. The UDF framework provides for unlimited extensibility and advanced operations on data, from simple UDF framework regressions to deep neural networks. Algorithms can be written in languages familiar to data scientists, such as Python, and call out to deep learning libraries such as TensorFlow, Caffe, and Torch.

Explore even further by reading our blog posts: Machine Learning and Predictive Analytics in Finance: Observations from the Field and How Does a GPU Database Play in Your Machine Learning Stack??