Build Agents with Real-time Context

Q: How is this different from a vector database like Pinecone or Milvus?

Pure vector databases solve one retrieval shape — approximate nearest-neighbor over embeddings. Real agent turns also need to filter by account/permission, join to live operational tables, traverse relationships, weight by recency, and apply geofences. Kinetica runs vector similarity as one of several operators in the same vectorized query plan — alongside SQL, graph, time-series, and spatial — so a hybrid retrieval is one statement, not five tool calls. Independent benchmarks measure 5–14× faster ANN than pure vector DBs on VectorDBBench.

Q: Do I need to maintain a separate embedding pipeline?

No. Embeddings are generated in-database as rows arrive — Kinetica invokes NVIDIA NIM-hosted embedding models from SQL on the same GPU fabric that serves queries. There is no nightly re-embed batch, no embedding store to keep in sync with operational data, and no round trip to an external model server per row. The agent retrieves against embeddings that reflect the last few seconds of writes.

Q: How does an agent connect to Kinetica?

Any way it wants. Kinetica is a native MCP server — Claude, Cursor, Copilot, Codex, and any MCP-capable agent discover its tables and tools through the standard protocol. It also speaks Postgres wire (psql, JDBC, ODBC, SQLAlchemy), exposes a REST API , ships Python and Java SDKs , integrates with LangChain and LangGraph as a first-class tool node, and supports NL2SQL for agents that prefer to write their own queries. The choice of agent framework never constrains the choice of database.

Q: What is hybrid retrieval, exactly?

Inside one SQL statement, the planner runs vector similarity (CAGRA / HNSW on GPU), filters by structured predicates, joins to live operational tables, traverses a graph, applies ASOF / WINDOW operators for recency, and evaluates ST_* spatial predicates — against the same tables, in the same scan, with one transactional view. In a federated stack, those steps become network hops, auth boundaries, and consistency windows; on Kinetica they're column types in the same engine.

Q: Will my coding agent know how to write Kinetica SQL?

Yes — we ship two open-source skill plugins. kinetica-execute teaches agents SQL analytics, geospatial, graph, time-series, security, and admin (with a dual-runtime CLI for running queries directly). kinetica-code teaches the Python SDK and embedded SQL. One install command — npx skills add kineticadb/agent-skills — activates them across Claude Code, Cursor, OpenAI Codex, Windsurf, Gemini CLI, GitHub Copilot, Roo Code, Cline, Aider, Continue, Amazon Q, and any agent that reads SKILL.md .

Q: Can I run Kinetica next to my existing data warehouse?

Yes. Kinetica ingests from Kafka, S3 / GCS data lakes, data warehouses, and operational databases via CDC. Most agentic deployments leave the warehouse where it is and use Kinetica as the real-time retrieval layer in front — the warehouse continues to serve BI; Kinetica serves the agent's live retrieval and analytics in one engine with sub-second response times on streaming data.

AgentsVector RAGSQL RAGGraphSpatialMCP

Most agents reason over stale snapshots. Kinetica resolves vector, SQL, graph, and spatial in one GPU query plan — on live streams, across billions of rows.

Start freeReal-time · GPU-accelerated · converged · at scale

The category

Three ways teams build agent retrieval today. Two stitch separate systems together — one runs it all in a single query plan.

CLICKHOUSE · SINGLESTORE · DRUID

Fast OLAP. No vector, no graph, no spatial.

Built for sub-second analytics on structured tables. Excellent at filtering and aggregating, but the agent's retrieval surface — semantic search over documents, graph traversal, geospatial joins — lives in other systems. The agent has to choreograph all of them.

Fast structured queries on streaming data

No native vector search, graph, or geospatial

Agent stitches results across systems in tool-call code

The mechanism

Converged is easy to claim. GPU is why it's fast — and fresh.

Kinetica architecture for agentic retrieval & analytics

IngestEngineAccess

Hybrid retrieval, one SQL

A single SELECT can run ANN vector lookup, filter on tabular predicates, traverse a graph, and apply a geofence — all on the same scan, all on GPU.

Streaming-fresh embeddings

Embeddings are computed and indexed as data arrives. There is no nightly re-embed batch and no drift between operational facts and what the agent retrieves.

Speaks every agent surface

MCP for tool-using agents, NL2SQL for LLMs writing their own queries, LangChain for orchestration, Postgres wire for everything else.

The numbers

Independent benchmarks, same hardware, published. The engine that removes the network hops also wins on raw speed.

Structured retrieval

TPC-DS SF-200 · More is better

99of 99

Can the engine run the full enterprise SQL suite an agent's structured queries resemble? JOIN-heavy, aggregation-heavy, at scale.

Kinetica98 / 99 run

ClickHousepartial suite

LatencyKineticaSuite< 1s22 queriesof 98< 10s80 queriesof 98Total98 runof 99

Semantic retrieval

VectorDBBench · Less is better

5×faster

How fast can new embeddings be ingested and made queryable — so the agent's recall stays fresh inside the turn?

Kinetica5× ingest

Prior leader1× baseline

StageKineticaVector DBIndexGPU, livebatchFreshimmediatere-indexNo indexexact NNn/a

In the loop

Converged execution · One plan

1query plan

When a turn needs vector + filter + join + traverse together, how many systems and network hops does it cross?

Kinetica0 hops

Federated stack3–5 hops

Per turnKineticaFederatedEngines13–5Auth surfaces13–5Consistency1 viewN windows

Methodology

Structured retrieval: independent Radiant Advisors analysis, TPC-DS SF-200, and Kinetica 7.2 vs. ClickHouse 25.10 on identical hardware. Semantic retrieval: VectorDBBench, NVIDIA GTC 2024. GPU acceleration adds further headroom on both. "In the loop" counts systems crossed, not timings.

Full SQL and data →

The loop

An agent makes six to twelve blocking retrievals per turn, each gating the next. It's a new kind of user — with a latency budget.

As Janakiram MSV observed in Forbes after OpenAI's Rockset acquisition: production AI didn't need another vector database — it needed real-time retrieval over operational data. Pure vector stores are a feature, not a product.

Agent loop · single turn · left to right

in-loop hopfederated costKinetica · native

01 · Key-value

Working memory

Parallel KV lookups at 100k+ reads/sec against the same tables — no separate Redis.

02 · Vector

Semantic recall

CAGRA & HNSW on GPU. 5× faster ingest on VectorDBBench; embeddings stay fresh.

03 · SQL

Filter & join

All 99 TPC-DS queries where ClickHouse runs a partial suite — vector + filter + join in one statement.

04 · Graph

Traversal

Native graph over the same tables as your SQL. Solve, match, traverse in one query.

05 · Time-series

Recency & ASOF

Vectorized ASOF and WINDOW operators on GPU. Continuous views as ticks arrive.

06 · Spatial

Location & geofence

Native ST_* operators and in-database tile rendering, fused with vector and SQL.

Federated · the hidden cost

Every modality boundary is a network hop, an auth surface, and a consistency window. Errors compound: a stale read at step 3 of a 12-step agent loop is confidently wrong by step 12. Strong consistency isn't a nice-to-have for agents — it's a correctness requirement.

Unified · the architectural payoff

All six retrieval modes run as column types in the same engine. The agent issues one SQL statement; Kinetica fans out across vector, structured, graph, time-series, and spatial inside a single query plan. One auth surface. One transactional view. No drift.

The toolkit

However your agent talks, Kinetica answers on the same engine and the same tables — the framework never constrains the database.

Model Context Protocol · native

Kinetica is an MCP server.

Any MCP-capable agent — Claude, Cursor, Copilot, Codex — discovers Kinetica's tables, schemas, and tools and queries them through the standard protocol. No glue code, no custom adapter.

Hybrid retrieval (vector + filter + join + traverse) returns as a single tool result.

# point any MCP client at the server
connect mcp://kinetica.your-domain.com

# the agent now sees tables + tools
tools: query_sql, vector_search,
       graph_solve, st_filter

NVIDIA NIM · colocated

Embedding and inference next to the data.

Kinetica invokes NIM-hosted embedding and LLM models directly from SQL. Embeddings are generated as data arrives — no separate pipeline, no extra hop, no model server round-trip per row. The retrieval path and the generation path run on the same GPU fabric.

The skills

Your coding agent already knows how to use Kinetica.

Two skills. One install command. Eleven agent platforms.

kinetica-execute teaches agents SQL analytics, geospatial, graph, time-series, security, and admin — with a live dual-runtime CLI for running queries directly. kinetica-code teaches the Python SDK and embedded SQL for application developers. Both install in one command and activate based on what the agent is being asked to do.

Claude CodeCursorOpenAI CodexWindsurfGemini CLIGitHub CopilotRoo CodeClineAiderContinueAmazon Qand any agent that reads SKILL.md

$npx skills add kineticadb/agent-skills

github.com/kineticadb/agent-skills

Toolbelt

Give your agents one endpoint to your data.

Skills teach an agent what to do. Toolbelt gives it a live connection to your data — an MCP server any agent can call.

One MCP server. Every tool your agent needs.

Toolbelt is an MCP server that connects AI agents to your data through a single endpoint. One command auto-detects your AI client, writes the MCP config, and mints credentials — then the agent discovers the tools it needs and gets to work.

Single MCP endpoint

One URL for any agent that speaks Model Context Protocol — Claude, Cursor, Windsurf, ChatGPT, Gemini, Codex CLI, and more.

Tools the agent picks from

SQL queries, vector search, knowledge-graph traversal, schema introspection, and ingestion-job inspection — all exposed as toolbelt_* MCP tools.

Namespace isolation

Assets, vectors, and graphs are scoped per user or team by namespace UUID, so every agent stays in its own lane.

Self-host or hosted

Run the entire stack on your own cluster with the Helm chart, or start on the hosted edition with no infrastructure to run.

$npx @toolbeltai/cli

toolbelt.ai

One engine.
Every retrieval mode.

Start free Read the docs

Frequently asked questions

How is this different from a vector database like Pinecone or Milvus?

Pure vector databases solve one retrieval shape — approximate nearest-neighbor over embeddings. Real agent turns also need to filter by account/permission, join to live operational tables, traverse relationships, weight by recency, and apply geofences. Kinetica runs vector similarity as one of several operators in the same vectorized query plan — alongside SQL, graph, time-series, and spatial — so a hybrid retrieval is one statement, not five tool calls. Independent benchmarks measure 5–14× faster ANN than pure vector DBs on VectorDBBench.

Do I need to maintain a separate embedding pipeline?

No. Embeddings are generated in-database as rows arrive — Kinetica invokes NVIDIA NIM-hosted embedding models from SQL on the same GPU fabric that serves queries. There is no nightly re-embed batch, no embedding store to keep in sync with operational data, and no round trip to an external model server per row. The agent retrieves against embeddings that reflect the last few seconds of writes.

How does an agent connect to Kinetica?

Any way it wants. Kinetica is a native MCP server — Claude, Cursor, Copilot, Codex, and any MCP-capable agent discover its tables and tools through the standard protocol. It also speaks Postgres wire (psql, JDBC, ODBC, SQLAlchemy), exposes a REST API, ships Python and Java SDKs, integrates with LangChain and LangGraph as a first-class tool node, and supports NL2SQL for agents that prefer to write their own queries. The choice of agent framework never constrains the choice of database.

What is hybrid retrieval, exactly?

Inside one SQL statement, the planner runs vector similarity (CAGRA / HNSW on GPU), filters by structured predicates, joins to live operational tables, traverses a graph, applies ASOF / WINDOW operators for recency, and evaluates ST_* spatial predicates — against the same tables, in the same scan, with one transactional view. In a federated stack, those steps become network hops, auth boundaries, and consistency windows; on Kinetica they're column types in the same engine.

Will my coding agent know how to write Kinetica SQL?

Yes — we ship two open-source skill plugins. kinetica-execute teaches agents SQL analytics, geospatial, graph, time-series, security, and admin (with a dual-runtime CLI for running queries directly). kinetica-code teaches the Python SDK and embedded SQL. One install command — npx skills add kineticadb/agent-skills — activates them across Claude Code, Cursor, OpenAI Codex, Windsurf, Gemini CLI, GitHub Copilot, Roo Code, Cline, Aider, Continue, Amazon Q, and any agent that reads SKILL.md.

Can I run Kinetica next to my existing data warehouse?

Yes. Kinetica ingests from Kafka, S3 / GCS data lakes, data warehouses, and operational databases via CDC. Most agentic deployments leave the warehouse where it is and use Kinetica as the real-time retrieval layer in front — the warehouse continues to serve BI; Kinetica serves the agent's live retrieval and analytics in one engine with sub-second response times on streaming data.

Build Agents with Real-time Context

Fast OLAP. No vector, no graph, no spatial.

Structured retrieval

Semantic retrieval

In the loop

Kinetica is an MCP server.

Embedding and inference next to the data.

Two skills. One install command. Eleven agent platforms.

One MCP server. Every tool your agent needs.

Single MCP endpoint

Tools the agent picks from

Namespace isolation

Self-host or hosted

One engine.Every retrieval mode.

Frequently asked questions

One engine.
Every retrieval mode.