Graph queries that see your whole table.

GraphDistributedSQLHybrid

One grammar for geospatial, social, and property graphs. Topology stored in fixed memory. Traversals restricted by the same SQL you write on the rest of your data — no copy, no sync, no second engine.

Start FreeSolve · Match · Query — all in standard SQL
The Category

Stop choosing between graph speed and analytical breadth — keep the graph beside your data, not on a copy.

YOUR DATA STACK tables · columns attribute data analytics · ML GRAPH DB topology copy duplicated attrs traversal only ETL · sync two systems · explicit data duplication
Neo4j · TigerGraph

Built for traversal — but it lives apart from your data.

Fastest at pure-graph workloads when the graph is static. The cost: graph topology is a copy, attributes are duplicated or streamed in via hooks, and joins back to relational data live in application code or batch jobs.

High traversal performance on static graphs
Explicit duplication; attributes must be synced
No native OLAP — analytics happen elsewhere
The Query Surface

Cypher you already know. SQL you already write. One planner.

Cypher inside · SQL outside · one query plan
Cypher · traversalSQL · aggregate
OUTER · SQL SELECT person, bank, SUM(amount), MAX(risk_score) FROM GRAPH_TABLE( INNER · CYPHER · GQL-COMPLIANT GRAPH expero.banking_graph MATCH (a:bank)-[:performed]-> (b:wire_message WHERE b.risk_score > 20) -[:is_for_transaction]->(c:banking_transaction) RETURN g.party_name AS person, a.bank_name AS bank, c.amount, b.risk_score ) GROUP BY person, bank ORDER BY SUM(amount) DESC; One planner picks column scans, index lookups, or graph walks per clause.
Cypher · GQL
Multi-hop patterns, variable-length paths, label filters.
GRAPH_TABLE()
Lifts Cypher results into the relational world for GROUP BY and joins.
One plan
Graph traversal and OLAP aggregation share the same optimizer.
SQL · Cypheraml_top_exposure.sql
-- Top-exposure wires from one bank.
-- Cypher does the traversal,
-- SQL does the aggregation —
-- in a single statement.

SELECT wire, risk,
       ROUND(SUM(amount), 0) AS total
FROM GRAPH_TABLE (
  GRAPH expero.banking_graph
  MATCH (a:bank WHERE
            a.bank_name = 'Harvey Group')
        -[:performed]->
        (b:wire_message WHERE
            b.risk_score > 20)
        -[:is_for_transaction]->
        (c:banking_transaction)
  RETURN b.NODE AS wire,
         b.risk_score AS risk,
         c.amount AS amount
)
GROUP BY wire, risk
ORDER BY total DESC
LIMIT 10;
↑ Your Neo4j Cypher ports over. Your SQL tools keep working.
The Mechanism

A graph topology that doesn't move when your graph does.

CSR vs Double-Link Structure · Edge insert/delete
CSR · contiguousDLS · linked
CSR · COMPRESSED SPARSE ROW DLS · DOUBLE-LINK STRUCTURE BEFORE INSERT — edges packed by source node e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11 e12 └─ node 1 ─┘└─── node 2 ───┘└── node 3 ──┘└─ node 4 ─┘ ↓ INSERT edge on node 2 → all downstream cells must shift AFTER INSERT — block rewrite required e1 e2 e3 e4 e5 e6 ★new e7 e8 e9 e10 e11 e12 7 cells shifted in memory COST PER UPDATE Memory shift proportional to graph size — O(n) per edge. Storage breaks under streaming load. FIXED-SIZE LINKED CELLS — node groups via pointers e1 e2 e3 e4 e5 e6 node 1 node 2 node 3 e7 e8 e9 e10 e11 node 4 node 5 ↓ INSERT edge on node 2 → append cell, repoint two pointers AFTER INSERT — only the link traversal changes e1 e2 e3 e4 e5 ★new +node 2 COST PER UPDATE Two pointer writes — O(1). No memory degradation under upserts. Storage holds shape under streaming load. STORAGE: 6 × edge count · regardless of node-to-edge variance
Memory per edge

Fixed-size cells. Storage cost is exactly 6 × edge count — predictable from day one to day n.

O(1)
Edge upsert cost

Insert and delete are pointer operations. The graph holds its shape under continuous updates.

10 GB
Billion-node ceiling

Where conventional graph structures need 100 M GB to hold a fully-connected billion-node graph, DLS needs 10 GB. Real graphs are sparser — the same compression ratio holds.

Peer-reviewed
The DLS architecture, distributed graph servers, and the at-scale OLAP integration are documented in "A Fixed-Storage Distributed Graph Database Hybrid" — published research, available on arXiv.
Read the paper →
The Live Graph

O(1) upserts mean your graph never falls behind your data.

Source tables → graph topology · O(1) per change
insertupdatedelete
STREAMS Kafka · wires CDC · accounts tombstones SOURCE TABLES accounts wire_transfers INSERT · UPDATE · DELETE arriving continuously add_table_monitor = 'true' LIVE GRAPH · DLS topology in sync O(1) CSR-BASED ENGINES Batch reload window. Topology lags ingest. Streaming graphs require external rebuild jobs. KINETICA · DLS Graph queries while you ingest. No rebuild window. No stale topology.
Who this matters for — AML wire-tracing · real-time fraud rings · fleet & IoT telemetry · supply-chain visibility · social-network freshness.
SQLlive_fraud_graph.sql
-- The graph follows the tables.
-- The tables follow the stream.
-- No rebuild step in between.

CREATE OR REPLACE GRAPH fraud_live (
  NODES => INPUT_TABLES(
    SELECT * FROM accounts
  ),
  EDGES => INPUT_TABLES(
    SELECT * FROM wire_transfers
  ),
  OPTIONS => KV_PAIRS(
    add_table_monitor = 'true',
    save_persist      = 'true'
  )
);

-- accounts and wire_transfers
-- can now be streamed into.
-- The graph stays current.
↑ One option flag. The graph follows your CDC stream.
The Numbers

500K edges in 250 ms. 4.3 billion edges, queryable — production results on real hardware, dated and attributed.

Shortest path · five locations

Seattle road network

500K edges · multi-source/multi-target
~250ms
graph500K edges
queries5 lon/lat pairs
solverSHORTEST_PATH + MULTIPLE_ROUTING
weightsdistance ÷ speed
Multi-billion edge graph · 3-hop traversal

Single node, huge memory

4.3 B edges · 2.8 B vertices · source-to-many
1.2 – 1.8sec
memory1 TB RAM + 0.5 TB ZRAM
compression4× on ZRAM tier
build time~4.5 hours
query type3-hop, 1-to-many
Partition rebalance · published research

Balanced vs random partitioning

Distributed graph solve, same workload
100×faster
topologyinterface-node duplicated
algorithmpartition rebalancing
workloadcross-partition shortest path
sourcearXiv:2201.02136
Storage cost · what the graph holds at scale
100M 10M 1M 10G 100 M GB CSR 10 GB DLS Theoretical ceiling: billion-node graph, fully connected Real graphs are sparser — same compression holds.
~6%
Storage variance with arbitrary edge-degree skew
0
Memory degradation across continuous upserts
4.5h
To build a 4.3 B edge graph from cold
Sources: Seattle benchmark — graph engine production tests, 7.1.10.2 / 7.2.0.7 builds, Q3 2024.
4.3 B edge graph — POC build, single-node 1 TB RAM + 0.5 TB ZRAM, x4 ZRAM compression, source-to-many 3-hop query, 2024.
100× partition rebalance speedup — published in arXiv:2201.02136 (2022). Comprehensive multi-platform benchmark refresh in progress.
The Toolkit

Solve. Match. Query. Every operator your graph workload needs.

Solve/solve/graph

Generic graph solvers — paths, ranks, centrality.

Network-agnostic algorithms that operate on any graph: shortest paths, travelling salesman, backhaul routing, PageRank, Markov chain probability, centrality measures, all-paths enumeration.

src dst
EXECUTE FUNCTION SOLVE_GRAPH(
  GRAPH       => 'seattle_roads',
  SOLVER_TYPE => 'SHORTEST_PATH',
  SOURCE_NODES => INPUT_TABLES(...)
);
shortest_path · multiple_routing (TSP) · backhaul_routing · page_rank · probability_rank · centrality · closeness · stats_all · inverse_shortest_path
Match/match/graph

Purpose-built matchers for real-world problems.

Higher-order solvers built on combinations of the generic primitives — for routing, fraud, scheduling, and supply-chain optimization. Two patented algorithms anchor this family.

truck depot demand cluster
EXECUTE FUNCTION MATCH_GRAPH(
  GRAPH        => 'logistics',
  SOLVE_METHOD => 'match_supply_demand',
  OPTIONS      => KV_PAIRS(...)
);
match_supply_demand (Patent) · map_matching HMM (Patent) · charging_stations · pickup_dropoff · loops (fraud rings) · similarity (Jaccard) · clusters (Louvain · RSB) · pattern · batch_solves
Query/query/graph

Label-aware traversals with OLAP filters.

Hop-based pattern queries with node and edge labels. Restrictions can reference any column on any table — including columns that aren't even part of the graph.

PERSON friend friend family CHESS RINGS = 3 friend only + Person.age filter
GRAPH social
MATCH(a:PERSON)-[ab:Friend]->{3}(b:CHESS)
RETURN a.node as source, b.node as target
adjacency_solver · pattern_matching · hop-based query · label restrictions · target-node-label filtering · OLAP column expressions inside traversal
The Patents

Four patented solvers that run in the database — the work your competitors send out to OR-Tools, Gurobi, or a routing service.

Map matching · HMMPatent

Snap GPS to road, at scale.

Adaptive-kernel Markov chain with range-tree closest-edge search. Turns raw GPS samples into validated road-network paths over any OSM-derived graph.

noisy GPS snapped path
Fleet · telematics · ride-hailing · AIS vessel tracking · last-mile delivery
Supply-demand · MSDOPatent

Mixed-integer optimization, in the database.

Multi-step, multi-modal, spec-matching dispatch via MILP. Air → sea → land with partial loading and capability constraints — no external OR solver.

hub AIR SEA LAND demand
Logistics · defense · CPG distribution · disaster relief · fleet dispatch
Eulerian loopsPatent

Find rings, not just paths.

Closed-cycle enumeration with unlimited hop counts. Surfaces money-movement loops where conventional path solvers only return open trails.

closed cycle
AML structuring · layering detection · round-trip transactions · circuit analysis
EV charging routingPatent

Range-aware routing, thousands of stations.

Optimal path across an EV charging network with range constraints and penalty-tuned stops. Single solver across a continent of stations.

start dest
EV navigation · fleet electrification · charging-network planning · range optimization
The Hybrid

Restrict graph traversals with the same SQL you'd write on the table.

Graph traversal · OLAP filter applied during walk
visitedfilter passfiltered out
PERSON TABLE name age status Jane 34 active Bill 52 active Alex 29 active Susan 41 churn Tom 29 active not part of the graph FILTER PREDICATE age < 40 AND status = 'active' RESULT Jane → Bill ✗ Jane → Alex ✓ Alex → Tom ✓ Susan filtered Jane 34 · active Bill 52 · age fail Alex 29 · active Tom 29 · active Susan churn friend friend friend TRAVERSAL · Jane → 3 hops on 'friend' 1 Engine walks the graph topology 2 For each candidate node → 3 OLAP evaluates predicate on Person table 4 Failed nodes pruned mid-traversal No copy. No second engine. The filter is just SQL.
SQLquery_graph_with_olap_filter.sql
-- Three-hop friend traversal —
-- restricted by a column on a table
-- that isn't even in the graph.

EXECUTE FUNCTION QUERY_GRAPH(
  GRAPH   => 'social',
  QUERIES => INPUT_TABLES(
    (SELECT 'Jane' AS NODE_NAME),
    (SELECT 'CHESS' AS TARGET_NODE_LABEL),
    (SELECT 2 AS HOP_ID,
            'friend' AS HOP_EDGE_LABEL)
  ),
  RESTRICTIONS => INPUT_TABLES(
    -- this is just SQL.
    -- on a table not in the graph.
    SELECT
      name AS NODE_NAME,
      IF(age &lt; 40 AND
         status = 'active', 0, 1)
        AS ONOFFCOMPARED
    FROM Person
  ),
  RINGS => 3
);
↑ Person table. Person.age. Person.status. None of which exist inside the graph.
AND BECAUSE EVERYTHING SHARES THE ENGINE

Vector embeddings can become graph edges.

Because vector, graph, and relational all live in the same engine, a column of embeddings is a valid edge weight. Compute L2 distance between every pair of rows, construct edges where the distance is small enough — and now you have a similarity graph you can traverse with labels and OLAP filters in the same query.

Vector → Graph · pipeline
relationalvectorgraph
RELATIONS TABLE name vec[6] kaan [0.7,0.0,...] tan [0.7,0.2,...] jony [0.3,0.2,...] samy [0.1,1.0,...] rony [0.4,0.6,...] attribute + embedding L2_DISTANCE( v1, v2 ) AS WEIGHT cross join — pairs SIMILARITY GRAPH kaan tan jony samy rony edges = similar tastes
One CREATE GRAPH statement: vector column → L2 distance → graph edges → traversable with labels and OLAP filters in the same query.
SQLvector_to_graph.sql
-- vector(6) of movie preferences
CREATE OR REPLACE TABLE relations(
  name TEXT,
  movie_likes VECTOR(6)
);

-- edges from L2 distance
CREATE OR REPLACE GRAPH netflix (
  EDGES => INPUT_TABLES(
    SELECT
      t1.name AS NODE1_NAME,
      t2.name AS NODE2_NAME,
      'WATCHED' AS LABEL,
      L2_DISTANCE(
        t1.movie_likes,
        t2.movie_likes
      ) AS WEIGHT_VALUESPECIFIED
    FROM relations t1
    CROSS JOIN relations t2
    WHERE L2_DISTANCE(...) &lt; 0.6
  )
);
↑ Vector + graph + relational. Same engine. Same query.
In Production

Three workloads, three architectures, one engine — each running today in a single SQL statement, no sidecar solver, no batch job.

Use case 01 · AML

Wire-to-address exposure trail.

expero.banking_graph · 17 node labels · 20+ edge labels
5hops · one query
pattern
bank → wire → txn → account → address
layer
Cypher inside · SUM/COUNT outside
live
add_table_monitor on wires table
bonus
Eulerian loops detect layering rings
Use case 02 · Emergency response

Closest of 1,700+ fire stations.

osm_seattle · 1.5M-edge road graph · live OSM
1.7Korigins → 1 disaster
solver
match_batch_solves
trick
inverse_solve = many-to-one Dijkstra
output
animated SVG of every candidate route
variant
match_isochrone for 2-min coverage
Use case 03 · Logistics

Two depots. Twelve trucks. Partial loads.

match_supply_demand · MILP · multi-stop dispatch
12trucks · animated SVG
tie
SUPPLY_REGION_ID = depot
partial
drop part of load mid-route
multi-modal
AIR · SEA · LAND edge labels
scale
Brazil-wide variant with Voronoi pre-clustering
What this means
These aren't graph demos with logistics bolted on. They're logistics, fraud, and emergency-response workloads — running where your data already lives, with the same SQL surface your BI tools already speak.
— Live demos available in Kinetica Workbench · session files in kineticadb/graph
The Distribution

Two distribution modes. Choose for the workload, not the hype.

MODE A · REPLICATED

Full graph on every server.

The complete topology lives on each node. Every solver runs locally, end-to-end, with no inter-server coordination. Reads scale with node count; the limit is single-node memory.

SERVER 1 SERVER 2 SERVER 3 SAME GRAPH · LOCAL SOLVES · NO INTER-NODE TRAFFIC
FITS WHEN — small to medium graphs · low-latency reads · high-availability requirements · solver-heavy traffic · graphs that fit in a single node's memory budget.
MODE B · PARTITIONED

Sub-graphs across servers.

Topology is split — each server holds a partition with duplicated interface nodes for cross-partition traversals. Memory scales horizontally; cross-partition solves coordinate over the network.

PARTITION 1 PARTITION 2 PARTITION 3 DUPLICATED INTERFACE NODES · CROSS-PARTITION TRAFFIC
FITS WHEN — graphs that exceed single-node memory · partition-local solves dominate · embarrassingly parallel batch workloads · you've validated the partition algorithm balances your specific topology.
Engineering note

Single node interacting with many OLAP nodes — that's a winner. Distributed graph servers can ping-pong on partitions where the workload crosses too many boundaries. Choose the mode that fits the workload, not the demo.

— Kinetica engineering · 4.3 B-edge POC · 7 nodes ran ~8 sec on a query that took 1.8 sec on one node with the right memory budget

One engine. Every graph.
The one your analytics already speak to.

Frequently asked questions

How is Kinetica's graph database different from Neo4j or TigerGraph?
Dedicated graph databases store topology in a system that's separate from your relational data — which means attributes get duplicated and joins back to tables happen in application code. Kinetica keeps topology and attributes in the same engine: NODE, EDGE, WEIGHTS, and RESTRICTIONS are annotated column references on existing tables, so a graph traversal can filter on any column without copying or syncing anything.
What is the Double-Link Structure (DLS) and why does it matter for streaming graphs?
Standard graph storage uses Compressed Sparse Row (CSR), where edges are packed contiguously by source node. An insert on a busy node forces every downstream cell to shift in memory — an O(n) cost per edge. DLS uses fixed-size linked cells: each edge costs two pointer writes, an O(1) operation. Storage is always 6 × edge count regardless of degree skew, and the graph holds its shape under continuous upserts. The architecture is documented in arXiv:2201.02136.
Can graph traversals filter on columns that aren't part of the graph?
Yes — and that's the point of the hybrid architecture. Pass a SQL SELECT as the RESTRICTIONS argument to QUERY_GRAPH and the predicate is evaluated by the distributed OLAP engine for every candidate node during the walk. Failed nodes are pruned mid-traversal. The filter can reference any column on any table in the database, including columns the graph has never seen.
What graph algorithms ship with Kinetica out of the box?
Three endpoint families, all callable from standard SQL: Solve (shortest_path, multiple_routing/TSP, backhaul_routing, page_rank, probability_rank, centrality, closeness, stats_all, inverse_shortest_path), Match (match_supply_demand, map_matching, charging_stations, pickup_dropoff, loops, similarity, clusters via Louvain/RSB, pattern, batch_solves — four of which are patented), and Query (label-aware hop-based traversals with OLAP restrictions).
Which graph algorithms are patented Kinetica research, and why does that matter?
Four algorithms in the Match family ship as patented Kinetica research — each tied to a workload where generic graph databases force you out to OR-Tools, Gurobi, or a separate routing service. Map matching (map_matching) snaps noisy GPS to road-network paths via an adaptive-kernel Markov chain with range-tree closest-edge search. MSDO (match_supply_demand) runs multi-step, multi-modal MILP dispatch — air → sea → land with partial loading and capability constraints — in-database, with no external solver. Eulerian loops (loops) enumerate closed cycles with unlimited hop counts, surfacing money-movement rings that conventional path solvers return only as open trails. EV charging routing (charging_stations) returns range-aware paths across thousands of stations as a single solve. Together they replace a downstream stack of specialized routing and optimization services.
How big a graph can Kinetica hold, and how fast can it traverse it?
On a 500K-edge Seattle road network, a multi-source SHORTEST_PATH over five lon/lat pairs returns in roughly 250 ms. On a single node with 1 TB RAM + 0.5 TB ZRAM, Kinetica has held a 4.3 B-edge / 2.8 B-vertex graph and run 3-hop source-to-many queries in 1.2–1.8 seconds. Build time for that graph from cold was ~4.5 hours.
When should I use replicated vs partitioned distribution?
Use replicated when the graph fits in single-node memory — every solver runs locally with no inter-server coordination, and reads scale with node count. Use partitioned when the graph exceeds single-node memory and the workload is dominated by partition-local solves; interface nodes are duplicated across partitions to bridge cross-partition traversals. In our 4.3 B-edge POC, the same query ran 1.8 sec on one node with the right memory budget and ~8 sec on seven partitioned nodes — choose the mode that fits the workload, not the demo.

To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions. Cookie Policy