Architecture

Vectorization

The technology behind real-time analytics at scale

Kinetica is an analytics database designed from the ground up to leverage parallel compute capabilities of GPUs and modern 'vectorized' CPUs. This introduces a new level of brute-force compute power that breaks open the door to faster and more flexible querying across large and streaming datasets.

Location Intelligence Complex Operations

What is Vectorization?

The secret sauce behind breakthrough analytics lies in Kinetica's ability to utilize modern vectorized CPUs and GPUs

Database sequential processing illustration

Most databases have evolved with the CPU

The CPU has been the core of the computer for decades. Database systems have evolved alongside using sequential processing to perform calculations.

Take this example of an array of numbers. To add five to each number and place them into a new array, a CPU will rapidly work through the list.

But this sequential process has its limits.

Parallel GPU processing illustration

What if you could do 1000 instructions at once?

GPUs which typically have thousands of cores were designed to speed up drawing of graphics on a screen. Instead of rendering a pixel at a time, a GPU could render a whole screen in one go – a technique known as single-instruction, multiple data (SIMD).

It turns out this same capability is well suited to performing repeated similar instructions on data in parallel. With Intel's Advanced Vector Extensions (AVX) making it into CPUs in the data center, the path is now wide open to leverage vectorized compute in the cloud for analytics workloads.

Kinetica vectorization architecture diagram

How does Kinetica harness vectorization?

Kinetica was designed from the ground up to leverage the vectorization capabilities of GPUs and modern CPUs. Analytical functions in Kinetica have all been written from scratch to take advantage of vectorization.

Vectorization unleashes significant performance improvements – particularly on spatial and temporal queries at scale. Aggregations, predicate joins, windowing functions, graph solvers all operate far more efficiently.

Vectorization Gives You Freedom

With so much raw compute power, you won't need to worry about indexing, partitioning or downsampling.

Simpler Data Structures

Brute force vectorized compute means there is less need to think through schemas before data can be explored.

Low Latency

Simpler data structures means less to index. Combined with Kinetica's lockless, distributed architecture, data is available for query immediately after it lands.

Linear Scale Out

With less to index, the database scales in proportion to the size of the data. This leads to a smaller and more predictable scale-out footprint.

Less Engineering

Spend less time engineering schemas, and more time using your data. Business analysts have more flexibility and freedom for ad-hoc data discovery projects.

Try Kinetica Now:

Kinetica Cloud is free for projects up to 10GB

White Paper

Vectorization: The New Era of Big Data Parallelism

Every five to 10 years, an engineering breakthrough emerges that disrupts database software for the better. Vectorization is the newest breakthrough gaining momentum towards widespread adoption. Early adopters are using fully vectorized databases to foster new applications and reap lower costs.

Learn more about vectorization in this white paper.

Vectorization Whitepaper cover
Efficiency IconLower TCO

Vectorization Gives You Freedom

With so much raw compute power, you won't need to worry about indexing, partitioning or downsampling.

Large US Bank

700NodesSPARK
16NodesKINETICA

Large US Retailer

100NodesCASSANDRA
8NodesKINETICA

Large Pharma

88NodesHADOOP
6NodesKINETICA

Frequently asked questions

What does "vectorization" actually mean in Kinetica's engine?
Vectorization refers to running analytical operations as SIMD-style parallel kernels instead of one row at a time. Kinetica's analytical functions were written from scratch to exploit the thousands of cores in GPUs and Intel AVX vector extensions in modern CPUs.
How fast is Kinetica versus other analytic databases?
Kinetica benchmarks 8x faster than Databricks 9.1 LTS (Photon), 13x faster than ClickHouse 21 (independently benchmarked by Radiant Advisors), and 240x faster than PostGIS. The benchmark suite is published on GitHub.
What does vectorization let me skip in data engineering?
With brute-force vectorized compute, you don't need heavy indexing, partitioning, or downsampling before exploring data. That means simpler schemas, lower data latency after ingest, more linear scale-out, and less time engineering structures rather than analyzing data.
Is there a hardware footprint advantage from vectorization?
Yes. Customer migrations include a large US bank moving from 700 Spark nodes to 16 Kinetica nodes, a large US retailer from 100 Cassandra nodes to 8, and a large pharma from 88 Hadoop nodes to 6. Less indexing means the cluster scales more proportionally to data size.
Which operations get faster from Kinetica's vectorized kernels?
Aggregations, predicate joins, window functions, graph solvers, and geospatial rendering all run on vectorized kernels across thousands of cores simultaneously. The same kernels back the spatial and temporal query workloads at scale that motivated the engine's design.

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions. Cookie Policy