Unlock Opportunities with Continuously Updated Materialized Views
Materialized views are a feature available with many databases to store the results of a frequently executed query so that the same view can be accessed repeatedly without forcing the database to re-execute the query each time. This allows for faster access to the data, as the results of the query do not need to be computed each time the view is queried
As the underlying data changes, materialized views need to be refreshed to keep the data in the view up to date. But this can be a resource intensive process. Many databases struggle to to keep materialized views up to date with frequently changing data, or for views from complex queries with multiple joins, aggregations, and subqueries.
For instance, consider a data stream of thousands of retail transactions every second. A traditional materialized view that is built on top of this transaction stream might only be updated once every day, or perhaps even as infrequently as once a week – if the query is really complex. As a result, you end up with upstream dashboards and analytical tools that are not at pace with the real-time data feed.
Kinetica’s powerful vectorized engine removes this constraint, allowing you to build and maintain complex materialized views on data that is changing continuously. This allows for high volumes of concurrent users accessing a materialized view will always see the most up-to-date version of the results.
What is a materialized view?
Let’s first understand what a view is. SQL views are virtual tables that are a result of stored queries. They are like a normal table in the database and contain rows and columns of data, but the data is not physically stored in the database. Instead, when a view is called, it executes the query and returns the result set.
A materialized view is similar to a regular view in that it is based on one or more underlying tables, but the data in the materialized view is actually stored on disk like a regular table. This data can then be queried directly, without having to re-execute the underlying query.
Traditional databases cannot update materialized views continuously
Materialized views are typically used to improve the performance of queries that involve aggregating large amounts of data or joining multiple tables. Since the results of the query are already stored (and indexed) in a table, it can be accessed directly, rather than being computed on the fly each time the query is run.
But there is a flip side to this performance gain, which is the overhead of maintaining and refreshing a materialized view. This overhead depends on the complexity of the query that is used to create the view and the frequency at which it needs to be refreshed. It can be quite costly to maintain a materialized view if the underlying data is constantly being changed or updated.
The primary cost associated with maintaining materialized views is query execution time. If the query used to define the view is complex or requires cross-table joins, this can take considerable time to run every time the view needs to be refreshed. Additionally, if the underlying data changes often or there are frequent updates, it will take longer for the view to refresh as it must account for all of these changes each time it updates.
How does Kinetica maintain continuously updated materialized views?
Kinetica leverages its built-in vectorized and distributed query processing capabilities to blow through computational barriers and provide continuously updated materialized views.
Vectorization, to put it simply, is the ability to execute each instruction – like, say, adding two numbers – on multiple pieces of data at the same time. This is thousands of times faster than the traditional sequential mode of execution, where each piece of data is operated on in sequence, one after the other.
Vectorized (Single Instruction Multiple Data)
These performance gains are substantiated by benchmarks that show that Kinetica is 8 X faster than Databricks 9.1 LTS (Photon), 50 X faster than Clickhouse and 240 X faster than PostGIS.
Kinetica has a vast library of in-database functions that are fully vectorized, and it has native connectors for streaming data sources like Kafka. This allows it to quickly process high frequency streaming data and run really complex vectorized queries that execute much faster than other databases. As a result, Kinetica can be used to set up always on analytical pipelines that keep materialized views continuously updated to always reflect the latest version of the data.
For instance, the query below, from one of our example workbooks, creates a materialized view that stores summary statistics about ship tracks in the Gulf of Mexico region. The underlying table for this query –
ship_tracks – receives a few thousand records per second from a Kafka topic containing information on new ship locations. Notice the
REFRESH ON CHANGE option, Kinetica is able to process all of this data and run complex geospatial queries to keep this view updated in real-time.
CREATE OR REPLACE MATERIALIZED VIEW track_summary REFRESH ON CHANGE AS SELECT TRACKID, COUNT(*) AS n_obs, ROUND(ST_TRACKLENGTH(Y, X,TIMESTAMP, 1) / 1000) AS track_length_km, ROUND(ST_TRACKDURATION(HOUR,TIMESTAMP)) as duration, ROUND((ST_TRACKLENGTH(x,y,TIMESTAMP,1) / 1000)/ST_TRACKDURATION(HOUR,TIMESTAMP)) as avg_speed, MIN(X) AS min_longitude, MAX(X) AS max_longitude, MIN(Y) AS min_latitude, MAX(Y) AS max_latitude FROM ship_tracks GROUP BY TRACKID ORDER BY track_length_km DESC;
This is simply not possible with other technologies.
You can try this out for free by running the entity tracking example in Kinetica’s workbench. Alternatively, you can access the workbook and the code from our examples repo on GitHub.
Use-Cases for Continuously Updated Materialized Views
Kinetica’s ability to maintain complex materialized views on top of streaming data sources, opens up new possibilities.
- Real-time analytics and reporting: Materialized views enable users to quickly query and analyze large datasets in real time.
- Eventing systems: Kinetica’s wide range of in-database functions can be coupled with continuously updated materialized views to create always on eventing systems that detect events as they unfold and trigger downstream alerts.
- Real-time dashboards for data exploration: Dashboards and charts will always reflect the most current data and calculations. These are extremely useful for industries such as finance where analysts need to observe and respond to events in real-time.
- Machine learning models: One of the challenges with operationalizing machine learning models is the inability to serve the most up-to-date versions of complex derived variables to these models for inferencing. Continuously updated materialized views solve this problem.
- Replace traditional workloads: The computational constraints imposed by traditional databases are not an issue with Kinetica. So less performant materialized views from these databases can be boosted both in terms of performance and in terms of scale by switching to Kinetica.
We’re a global team, and you can reach us on Slack with your questions and we will get back to you immediately.