Kinetica has a deep suite of geospatial functions which can be utilized directly through the APIs or SQL. Kinetica is also able to natively generate maps of that data.
In this tutorial, we’ll continue using the NYC Taxi dataset to ask which neighborhoods in New York City are likely to tip the most. We’ll use spatial functions to join our taxi data with neighborhood information, and then plot that data on a map using WMS (Web Mapping Service)
Let’s try and answer this question: If you were a taxicab driver, which neighborhoods should you go to pick up high-tipping passengers?
Getting Started with Spatial Questions
Note: In this editor, we’ll be creating new temporary tables. Because Reveal only permits read-only queries by default, we recommend you use the KiSQL editor in GAdmin, or alternatively another SQL tool.
Each entry in the NYC Taxi Dataset has the longitude and latitude of the pick-up points and drop-off points. Let’s take a quick look:
SELECT cast((tip_amount) as decimal(4,2)) AS tip, pickup_longitude, pickup_latitude FROM nyctaxi LIMIT 10;
+-----------+--------------------+-------------------+ | tip | pickup_longitude | pickup_latitude | +-----------+--------------------+-------------------+ | 0.0000 | -73.99353 | 40.74739 | | 1.0000 | -73.97957 | 40.784225 | | 10.0000 | 0.0 | 0.0 | | 1.0000 | -73.99609 | 40.72358 | | 1.9600 | -74.00848 | 40.720837 | | 8.3700 | -73.97677 | 40.75899 | | 2.0000 | -73.97718 | 40.76477 | | 0.0000 | -73.958984 | 40.77512 | | 1.0000 | -73.96953 | 40.78546 | | 1.3600 | -73.97295 | 40.75891 | +-----------+--------------------+-------------------+
As you can see, we have tip amounts for given points (and some noise in the data). Kinetica has a suite of geospatial SQL functions that allow us to ask questions of this geospatial data.
Q1 : What is the average tip amount near a given point?
Le Bernardin was voted one of New York’s fanciest restaurants by Time Out Magazine. It’s on 51st street, and the latitude and longitude is 40.761334, -73.981777.
Let’s see if the average tip for taxis that picked up passengers within 20 meters of these coordinates is anything to note.
Kinetica 6.1 has over 80 standardized geospatial functions. We’ll use three of these functions to help us:
- ST_MAKEPOINT – Creates a standardized point reference
- ST_BUFFER – Creates a shape based on a point and a radius
- ST_WITHIN – Evaluates whether a point/shape is within another shape
SELECT avg(tip_amount) FROM nyctaxi WHERE ST_WITHIN( ST_MAKEPOINT(pickup_longitude,pickup_latitude), ST_BUFFER(ST_MAKEPOINT(-73.981777,40.761334), 0.002) ) = 1;
We’re not going to get rich looking for rides here! – So where should we go?
Q2 : Which neighborhoods tip the most?
To answer this question, we’ll need to combine both our NYC Taxi dataset, and the NYC Neighborhood dataset loaded in the Importing Data tutorial. The Neighborhood data provides a list of spatial geometries for each neighborhood. If you have not yet imported this table, you should do that now.
Note: The nyc_neighborhood should have been loaded as a ‘replicated’ table. When working with your own data, you’ll want to learn more about how data distribution works in relation to JOIN executions in Kinetica (https://www.kinetica.com/docs/concepts/joins.html).
We can use the ST_Within() operator again to determine which NYC Taxi events fall within a respective NYC neighborhood boundary and run some statistics at the aggregated neighborhood level.
Example: Get a count of total trip events, as well as an average value of the tip_amount.
CREATE TABLE GeoEval AS ( SELECT NTAName as Neighborhood, COUNT(*) as Trip_Ct, AVG(tip_amount) as Tip_Avg FROM nyctaxi, nyc_neighborhood WHERE ST_WITHIN(ST_MAKEPOINT(pickup_longitude,pickup_latitude),geom) = 1 GROUP BY NTAName )
This spatial join queries fairly complex neighborhood geometries (some with up to 2000 vertices). It may run slow on smaller instances.
Let’s break this statement down, piece by piece:
- Opening statement creates a new table called GeoEval which will contain the output of our spatial join results.
ST_Within()operator returns a boolean value (does my geom1 fall within geom2)
WHEREclause performs a spatial join and aggregation, if and only if that condition is true.
- Lastly, the
GROUP BYis made on the NTAName column – which represents the name of the neighborhood.
Now, if we query the output, we can see the NYC Taxi boroughs with trip counts and the average tip amount. We’ll constrain the analysis to only consider neighborhoods with more than 10 trips associated with them.
SELECT * FROM GeoEval WHERE Trip_Ct > 10 ORDER BY Tip_Avg DESC
+----------------------------------------+-----------+----------------------+ | Neighborhood | Trip_Ct | Tip_Avg | +----------------------------------------+-----------+----------------------+ | South Jamaica | 40 | 5.971749949455261 | | South Ozone Park | 29 | 4.46758615559545 | | Airport | 18978 | 4.3181452217636185 | | Baisley Park | 73 | 4.041095900208982 | | Springfield Gardens South-Brookville | 34 | 3.5044117675108066 | ...
South Jamaica! Who would have guessed!
Visualizing Spatial Data on a Map
Now let’s generate a map that will contain our spatial join results.
We can enrich this data with the WKT geometry column from the original NYC Neighborhoods table.
CREATE TABLE GeoEvalVis AS ( SELECT g.Neighborhood, g.Trip_ct, g.Tip_Avg, n.geom FROM GeoEval g, nyc_neighborhood n WHERE g.Neighborhood = n.NTAName )
We can see our final desired output best in this screenshot:
To output a map, we’ll use the /wms RESTful endpoint. Web Mapping Service is a popular standard for publishing map layers online.
By configuring the querying string parameters as documented, we can render map visualizations of our resulting table using a class-break renderer, which will allow us to specify value ranges and styling options for each range.
Here’s an example that’s been URL encoded:
http://<< host >>:9191/wms ?SERVICE=WMS &REQUEST=GetMap &FORMAT=image/png &TRANSPARENT=TRUE &VERSION=1.1.1 &LAYERS=GeoEvalVis &WIDTH=1280 &HEIGHT=600 &SRS=EPSG:900913 &BBOX=-8281291.4017365845,4970683.803714636,-8182916.946333695,5003704.599933787 &DOSHAPES=true &SHAPELINEWIDTHS=0 &SHAPEFILLCOLORS=e1ffdb,b5f4a8,68e268,1f961f,045104 &STYLES=cb_raster &CB_ATTR=Tip_Avg &CB_VALS=0:1,1.01:2,2.01:3,3.01:4,4.01:11
Here’s the encoded URL call which you can paste into a browser (don’t forget to insert your host name for Kinetica).
The output is a color-coded PNG map layer showing average tips for each borough. If you know your NYC geography, you’ll be able to spot the pattern.
The /wms/ map service works with the OGC specification, enabling these overlays to be used with OGC map viewing tools (e.g., Esri, Mapbox, GeoServer, and many other applications and API frameworks).
Below is an example of this same map layer being overlaid on maps within Esri.
We can see South Jamaica and JFK Airport depicted in dark green on the map, representing the highest tips.
This tutorial has covered a lot of ground, but we hope it has given you a quick overview of what is possible with Kinetica’s spatial capabilities including map rendering. On GPU-accelerated servers, these types of large scale geospatial queries will run in a fraction of the time of traditional geospatial systems.