Combining two or more tables using values from a related column is one of the most common tasks when analyzing data. This is typically done by matching the values in the related column(s).
But this runs into problems because real world data often does yield perfect matches.
This is particularly true when dealing with streaming data. For instance, imagine that you have two tables. A trades table showing the time at which different trades for a particular stock were executed and another with the quotes. Both these tables are recorded with a timestamp with millisecond intervals.
For every trade in the trades tables we would like to find a relevant quote from the quotes table i.e. the nearest quote. This requires us to combine the two tables on the timestamp column.
But we have a problem here. The time stamps for quotes and trades don’t always match. Out of the 7 rows in the trades table we have just two records with matching timestamps on the quotes table.
With an equality based join we miss out on a ton of information. There are values in the quotes table that are within a millisecond from records in the trades table. But we have simply skipped these because they weren’t exact matches, ending up with a pretty sparse output.
Perform inexact joins using ASOF
We can solve this problem using an ASOF join. The basic idea with an ASOF join is that it establishes an interval within which to look for matching values. And if there are multiple matches in the interval you can pick either the minimum or maximum value.
Keep it updated with a Materialized View
ASOF joins are computationally expensive to perform but Kinetica is an exceptionally fast database that can perform these joins and keep them updated in real time as new data streams into the input tables.
And all the heavy lifting is done by just a single materialized view query.
CREATE OR REPLACE MATERIALIZED VIEW asof_demo REFRESH ON CHANGE AS ( SELECT * FROM trades LEFT JOIN quotes ON ASOF ( trades.trade_ts, quotes.quote_ts, INTERVAL '0' SECOND, INTERVAL '1' MILLISECOND, MIN ) )
With just this one SQL query, Kinetica is now combining two tables with related columns that don’t match perfectly, and it is keeping this view updated in real time as new data streams in.
Watch the video 🙌
Try it out yourself
Try out our transit truck monitoring demo to see how easy it is to set up an ASOF join in Kinetica.
We’re a global team, and you can reach us on Slack with your questions and we will get back to you immediately.