Recently, a top 3 global retailer was able to reduce run times for a strategic analytic application from days on Alteryx to just 22 seconds using Kinetica.
The retailer sought to optimize supply chain logistics and operations, with the aim of identifying faster and more profitable delivery routing, and to identify optimal store locations. Kinetica was able to dramatically reduce the time taken to calculate most probable paths (MPP) and point-to-point (P2P) drive times at scale.
This example serves as a good case study for where Kinetica excels in comparison with other analytics applications, such as Alteryx, for workloads that require performance at scale with multi-step advanced analytics.
What is Alteryx? What is it good for?
Alteryx is a tool designed to make data analytics more accessible throughout organizations. Its platform provides building blocks to automate various elements of the analytics process. First released in 2006, Alteryx has gained popularity for data access, data cleansing, ETL, and other activities related to BI.
Alteryx is good for ease of use by all different personas in an organization with both code-free and expert modes. Alteryx’s Analytics Process Automation (APA) provides an easy way to automate components of the end to end analytics process. Users with no coding experience can build automated data pipelines and save time on running repetitive reports. Alteryx provides an intuitive way to design custom workflows and obtain self-service analytics.
What are Alteryx limitations?
While Alteryx is useful for dealing with repetitive manual processes, it is not as powerful as using a full-fledged analytic database. It’s meant for overnight batch reporting, not for when you need real time answers or to run a model many times over the course of a day.
Complex workflows on Alteryx will require significant hardware to run efficiently. This generally leads customers to spend a far higher amount on Alteryx than originally anticipated. Alteryx price points come in at the higher end of the market compared to other analytics tools.
Alteryx is not designed for today’s more advanced analytics use cases that require cutting-edge streaming or machine learning capabilities, or leveraging high-volume IoT data at scale. The Alteryx UI is outdated and does not provide much in the way of data visualization.
What does Kinetica offer?
Kinetica is a real-time analytic database with best-in-class location intelligence, powered by a cutting-edge vectorized architecture. With Kinetica, you can realize the full benefits of streaming data by fusing it with other data, providing full context and allowing you to use more history than with a streaming platform alone.
Gaining value from sensor data requires blending spatial, time series, and graph analytics. Kinetica has over 100 spatial functions including geo-joins, point in polygon, and map matching, and can create interactive visualizations with billions of geospatial data points. Kinetica combines this with a multitude of advanced time series functions and allows you to seamlessly use relational data in a native graph context for understanding relationships.
Kinetica customers are able to obtain real-time geospatial analytics at huge scale, such as the U.S. Postal Service, which uses Kinetica to combine millions of streaming location events from vehicle transmitters with billions of historical events, all available for dynamic route optimizations.
The Business Challenge
The real-estate division of this major US retailer used a geospatial platform to optimize their supply chain logistics and operations, with the aim of identifying faster and more profitable delivery routing, and to identify optimal store locations.
The problem that they faced was how to calculate the most probable path solves and drive times, at scale, and how to iterate faster in order to provide the insights to the executive team that are needed to feed into the tactical and strategic planning processes.
The customer was using an 80+ node Alteryx cluster to solve their most probable path calculations, but they were facing problems because the calculations were slow, and they were limited in the scale at which they could run the calculations. Some runs might take days, but there was an increasing expectation from the business to be able to run the calculations multiple times per day.
At this point Kinetica was invited to demonstrate our capabilities, and the customer set us the challenge of solving the most probable path by drive time, from one million possible start points to a specific endpoint within the Dallas Fort Worth metropolitan area.
The target time for this exercise was the 4 hour SLA that had previously been set for the Alteryx system. Based on their experience with Alteryx the customer stated that completing the task in under 8 hours would be viewed as a good result.
Proof of Technology (PoT) Architecture
The infrastructure used for the PoT was a relatively modest single node in GCP, with:
- 120 GB memory
- 32-core single-socket CPU
- 4 NVIDIA T4 GPUs
The Kinetica installation process was very straightforward, requiring the installation of the Nvidia GPU drivers and the Kinetica software. No additional software or configuration was required.
The plan for the PoT involved loading the data for the DFW road network. The DFW road network data was supplied by our technology partner Here.com, the same company that provides road and traffic data for companies such as Tom Tom.
We created a second dataset containing a million start points in the DFW area with a common endpoint.
Kinetica Network Graph & Graph Solvers
Kinetica provides a generic and extensible design of networks that can be tailored or used for various real-life applications, such as transportation, utility, social, and geospatial.
Key features of Kinetica graph capabilities include:
- Zero config — all the integration is already built into the platform
- Distributed processing for more scalability and resilience
- Highly performant graph technology with a comprehensive and growing list of graph solvers that you can use out-of-the-box with no need for a host of data scientists to make it work.
Kinetica currently provides the following list of solvers, and this list continues to grow as customers find more and more geospatial and graph applications for Kinetica:
|SHORTEST_PATH | PAGE_RANK | PROBABILITY_RANK | CENTRALITY | CLOSENESS | MULTIPLE_ROUTING INVERSE_SHORTEST_PATH | BACKHAUL_ROUTING | ALLPATHS | STATS_ALL | MARKOV_CHAIN | MATCH_SUPPLY_DEMAND | MATCH_LOOPS | MATCH_OD_PAIRS | MATCH_BATCH_SOLVES | ADJACENCY_SOLVER|
Proof of Technology Approach
This diagram below illustrates how we create and solve a geospatial graph in Kinetica.
- In Step 1 we load the geojson data provided by Here.com using Kinetica’s geojson loader.
- Once that data is in a table in Kinetica we can enrich and augment the dataset for example in Step 2 by setting the traffic speeds of each road segment. In a production scenario traffic volumes and updates to the network itself would usually be updated using a real-time feed.
- In Step 3 we build a native graph in Kinetica based on the underlying database table that we created in Step 2.
- In Step 4 we apply Kinetica’s powerful graph solving capabilities to identify the most probable path and P2P drive times at scale, optionally directing the output to a database table which can be used for further analysis and visualisation.
A key differentiator for Kinetica that separates us from traditional analytics is that Kinetica is vectorised from the ground up, and built to leverage both GPUs and AVX CPUs for extreme high-performance distributed graph, SQL and analytic operations. Kinetica uses GPUs to visualize data at scale with GPU-accelerated rendering of maps and accompanying dashboards at interactive speed inside Reveal, Kinetica’s powerful but easy-to-use geospatial and data exploration and insight discovery tool.
Proof of Technology Results
Kinetica created the graph of the Dallas Fort Worth road network from the underlying database table in approximately 9 seconds.
- Creation of the Kinetica native graph using the Here.com table data
|Create network graph of the DFW metro area (~420K unique road segments)||<10 secs|
- Solve times for the 1 million point-to-point drive-time exercise
|Activity||Solve||1MM WKT Updates|
|Solve 1MM MPP P2P||< 30 secs||<1,100 secs|
Kinetica solved the 1 million P2P drive times in approximately 22 seconds, exceeding the 4 hour best-case SLA stretch target by a significant margin. Generating the complex geospatial routing records for all 1 million origin/destination pairs took a little longer, at approximately 18 minutes. And of course not forgetting that this test was conducted on a single-node cloud instance, and that Kinetica’s distributed and linearly scalable architecture means that you can increase performance and throughput deterministically by scaling horizontally.
This is a typical demonstration of Kinetica’s powerful ability to integrate location analytics on large datasets, blend with other data at speed and at scale, and shows how Kinetica simplifies the effort needed to combine these analytic processes.
We’ve seen that organizations which have invested in their supply network management systems have benefitted from being able to react and re-plan in real-time, dynamically optimizing delivery routes to maintain their operational efficiency and customer satisfaction.
If you’d like to know more about our best-in-class geospatial and graphing analytics database capabilities please try our developer edition for free and contact me directly at firstname.lastname@example.org
Simon Ambridge is Sr. Solution Engineer at Kinetica.