Frequently Asked Questions
You have questions. We have answers! Here are the answers to some of the most frequently asked questions we receive about Kinetica's fast GPU-accelerated database.
- So, what exactly is Kinetica?
- OLAP? OLTP? Or Both?
- Is Kinetica scalable?
- What performance increase can I expect?
- Structured or unstructured data?
- What types of data can be stored?
- Where is the data stored?
- What operations are provided?
- How does Kinetica integrate with open source and commercial frameworks?
- How does Kinetica fit into a Lambda architecture?
- Does Kinetica replace Hadoop?
- Is special hardware needed?
- Can Kinetica be used for web apps?
- How do I get started?
So, what exactly is Kinetica?
Kinetica is a distributed, in-memory database accelerated by GPUs that can simultaneously ingest, analyze, and visualize streaming data for truly real-time actionable intelligence. Kinetica leverages the power of many core devices (such as GPUs) to deliver results orders of magnitude faster than traditional databases on a fraction of the hardware
OLAP? OLTP? Or Both?
Kinetica is a vectorized columnar database designed for analytics (OLAP) workloads. Kinetica was built from the ground up to leverage the parallel compute power of the GPU for fast response to analytic queries on large datasets. Kinetica stands out when used with streaming data, and large volumes of high-cardinality data. Kinetica is not typically used as a system of record, but is a great analytics complement to an OLTP system.
Is Kinetica scalable?
Yes, Kinetica is designed to be highly and predictably scalable – both up and out.
Kinetica runs on industry standard hardware, equipped with GPUs. Kinetica scales horizontally by simply adding nodes to distribute the data. A cluster can be scaled up at any time to increase storage capacity and processing power, with near-linear scale processing improvements for most operations. Sharding of data can be done automatically, or specified and optimized by the user.
A typical cluster might consist of multiple identical nodes, each with a couple GPUs and 1TB RAM per node. The GPU delivers almost linear scalability, which makes it viable to do real-time analytics on a 10TB, or even 100TB datasets with predictability.
What performance increase can I expect?
Harnessing the parallel processing power of many core devices (such as GPUs), an end user of Kinetica can expect anywhere from 10x-100x faster performance (ingest, analytics, and visualization) compared to some of the most advanced in-memory databases.
Kinetica was recently tested against a leading state-of-the-art in-memory system by a major retailer. The existing system was not meeting SLAs as they attempted to correlate weather and social media signals to predict demands on inventory.
The Kinetica cluster, powered by GPUs, proved to be over 100 times faster on their top-10 hardest queries. Furthermore, the Kinetica cluster achieved these improvements on a cluster that was 1/10 the size. This was on real-world data and workloads : star schema, fact tables of 150 billion rows related to multiple dimension tables in a classic distributed join with GROUP BYs.
These impressive metrics are seen against any CPU-based in-memory system. While placing data in memory reduces latency, and is a critical component to building a fast database, alone it is not enough. Once data is in memory, the bottleneck then becomes the compute resources available to process the data. The GPU solves this.
Structured or unstructured data?
Kinetica functions much like a traditional structured relational database and requires data in a structured format. To get started, create a table, set a schema, start inserting rows, and you're ready to start doing analytics. The GPU specifics are abstracted from the DBA or for the developer.
What types of data can be stored?
Kinetica organizes data in a manner similar to a standard relational database. Each database consists of tables, each defined by a schema. Data is strongly typed for each field in the schema and can be double, float, int, long, string, or bytes.
If you are using our native API, your interface to the system is that of an object-based datastore, with each object corresponding to a row in the table.
Where is the data stored?
Kinetica stores data in-memory. It is able to utilize both system RAM and vRAM. (the memory available on the GPU card itself).
The benefit to storing data in vRAM is that the transfer time is very, very fast. The downside is that vRAM is expensive and limited in capacity–currently 24GB on an NVIDIA K80.
For larger datasets, system RAM allows the database to scale to much larger volumes and scale out across many nodes of standard hardware. Data stored in main memory can be efficiently fed to the GPU, and this process is even more efficient with the NVLink architecture.
What operations are provided?
Kinetica provides the typical functionality you might expect of a relational database – create tables, add rows, read rows, and delete rows and so on. What really separates Kinetica is its specialized filtering and visualization functions. These functions can be performed through our native API and various language specific connectors, or by SQL through ODBC/JDBC connectors.
The API documentation is the best way to see the full range of operations available.
How does Kinetica integrate with open source and commercial frameworks?
How does Kinetica fit into a Lambda architecture?
Kinetica is well suited to act as the speed layer for hot real-time and historical data when sub-second latency is paramount. Open source and commercial connectors enable connectivity to a larger data lake. Kinetica provides more flexibility than alternative ‘speed layer’ solutions (HBase, Cassandra, and other NoSQL solutions).
Does Kinetica replace Hadoop?
If you have an existing Hadoop system, it's probably doing a good job at large-scale and cost-efficient data management and batch analytics. However, if you are struggling trying to do real-time analytics against the Hadoop cluster, Kinetica can function as a speed layer to take some of the pressure off your Hadoop cluster.
Is special hardware needed?
Kinetica is designed to take advantage of NVIDIA GPUs, which can be added to existing industry-standard hardware. Kinetica is certified on Cisco, Dell, IBM, and HPE hardware.
We've had customers who've used surplus or decommissioned boxes from their Hadoop cluster. It’s important to note that reduced reliance on indexing means that Kinetica often requires much less hardware when compared to other in-memory analytics solutions.
Alternatively, leading cloud providers including AWS, Microsoft Azure, Google, and Nimbix have all recently released top of the line GPU instances.
Can Kinetica be used for web apps?
How do I get started?
Here are a few ways to get up and running with Kinetica:
- Take Kinetica on test-drive through the interactive demo. Here you can interactively explore a selection of datasets and experience the power of GPU-accelerated compute.
- The next step is to try it with your own data, your own schemas and your own queries. Fill out the form at the bottom of this page and we can set you up a trial environment for you to experience it yourself.
- Kinetica also offers two easy programs to help you jumpstart installation and application development with Kinetica.
We welcome additional questions! To learn more, or if you'd like to get your hands on a trial please contact us.
Take the Kinetica Challenge!
Sometimes benchmarks and marketing copy can sound too good to be true. The best way to appreciate the possibilities that GPU acceleration brings to large-scale analytics is to try it with your own data, your own schemas and your own queries.
Contact us, and we'll set you up with a trial environment for you to experience it for yourself.