Visualizing Geospatial Queries

Kinetica has a deep suite of geospatial functions which can be utilized directly through the APIs or SQL. Kinetica is also able to natively generate maps of that data.

In this tutorial, we’ll continue using the NYC Taxi dataset to ask which neighborhoods in New York City are likely to tip the most. We’ll use spatial functions to join our taxi data with neighborhood information, and then plot that data on a map using WMS (Web Mapping Service)

Let’s try and answer this question: If you were a taxicab driver, which neighborhoods should you go to pick up high-tipping passengers?

Getting Started with Spatial Questions

Note: In this editor, we’ll be creating new temporary tables. Because Reveal only permits read-only queries by default, we recommend you use the KiSQL editor in GAdmin, or alternatively another SQL tool.

Each entry in the NYC Taxi Dataset has the longitude and latitude of the pick-up points and drop-off points. Let’s take a quick look:

SELECT
    cast((tip_amount) as  decimal(4,2)) AS tip,
    pickup_longitude,
    pickup_latitude
FROM nyctaxi
LIMIT 10;

Output

+-----------+--------------------+-------------------+
|       tip |   pickup_longitude |   pickup_latitude |
+-----------+--------------------+-------------------+
|    0.0000 |          -73.99353 |          40.74739 |
|    1.0000 |          -73.97957 |         40.784225 |
|   10.0000 |                0.0 |               0.0 |
|    1.0000 |          -73.99609 |          40.72358 |
|    1.9600 |          -74.00848 |         40.720837 |
|    8.3700 |          -73.97677 |          40.75899 |
|    2.0000 |          -73.97718 |          40.76477 |
|    0.0000 |         -73.958984 |          40.77512 |
|    1.0000 |          -73.96953 |          40.78546 |
|    1.3600 |          -73.97295 |          40.75891 |
+-----------+--------------------+-------------------+

As you can see, we have tip amounts for given points (and some noise in the data). Kinetica has a suite of geospatial SQL functions that allow us to ask questions of this geospatial data.

Q1 : What is the average tip amount near a given point?

Le Bernardin was voted one of New York’s fanciest restaurants by Time Out Magazine. It’s on 51st street, and the latitude and longitude is 40.761334, -73.981777.

Let’s see if the average tip for taxis that picked up passengers within 20 meters of these coordinates is anything to note.

Kinetica 6.1 has over 80 standardized geospatial functions. We’ll use three of these functions to help us:

  • ST_MAKEPOINT – Creates a standardized point reference
  • ST_BUFFER – Creates a shape based on a point and a radius
  • ST_WITHIN – Evaluates whether a point/shape is within another shape
SELECT  avg(tip_amount)
FROM    nyctaxi
WHERE   ST_WITHIN( 
           ST_MAKEPOINT(pickup_longitude,pickup_latitude), 
           ST_BUFFER(ST_MAKEPOINT(-73.981777,40.761334), 0.002)
        ) = 1;

Output:

$1.38

We’re not going to get rich looking for rides here! – So where should we go?

Q2 : Which neighborhoods tip the most?

To answer this question, we’ll need to combine both our NYC Taxi dataset, and the NYC Neighborhood dataset loaded in the Importing Data tutorial. The Neighborhood data provides a list of spatial geometries for each neighborhood. If you have not yet imported this table, you should do that now.

Note: The nyc_neighborhood should have been loaded as a ‘replicated’ table. When working with your own data, you’ll want to learn more about how data distribution works in relation to JOIN executions in Kinetica (https://www.kinetica.com/docs/concepts/joins.html).

We can use the ST_Within() operator again to determine which NYC Taxi events fall within a respective NYC neighborhood boundary and run some statistics at the aggregated neighborhood level.

Example: Get a count of total trip events, as well as an average value of the tip_amount.

CREATE TABLE GeoEval AS
(
   SELECT      NTAName as Neighborhood,
               COUNT(*) as Trip_Ct,
               AVG(tip_amount) as Tip_Avg
   FROM        nyctaxi, 
               nyc_neighborhood
   WHERE       ST_WITHIN(ST_MAKEPOINT(pickup_longitude,pickup_latitude),geom) = 1
   GROUP BY    NTAName
)

This spatial join queries fairly complex neighborhood geometries (some with up to 2000 vertices). It may run slow on smaller instances.

Let’s break this statement down, piece by piece:

  1. Opening statement creates a new table called GeoEval which will contain the output of our spatial join results.
  2. The ST_Within() operator returns a boolean value (does my geom1 fall within geom2)
  3. The WHERE clause performs a spatial join and aggregation, if and only if that condition is true.
  4. Lastly, the GROUP BY is made on the NTAName column – which represents the name of the neighborhood.

Now, if we query the output, we can see the NYC Taxi boroughs with trip counts and the average tip amount. We’ll constrain the analysis to only consider neighborhoods with more than 10 trips associated with them.

SELECT * FROM GeoEval 
WHERE Trip_Ct > 10 
ORDER BY  Tip_Avg DESC
+----------------------------------------+-----------+----------------------+
| Neighborhood                           |   Trip_Ct |              Tip_Avg |
+----------------------------------------+-----------+----------------------+
| South Jamaica                          |        40 |    5.971749949455261 |
| South Ozone Park                       |        29 |     4.46758615559545 |
| Airport                                |     18978 |   4.3181452217636185 |
| Baisley Park                           |        73 |    4.041095900208982 |
| Springfield Gardens South-Brookville   |        34 |   3.5044117675108066 |
...

South Jamaica! Who would have guessed!

Visualizing Spatial Data on a Map

Now let’s generate a map that will contain our spatial join results.

We can enrich this data with the WKT geometry column from the original NYC Neighborhoods table.

CREATE TABLE GeoEvalVis AS
(
    SELECT     g.Neighborhood,
               g.Trip_ct,
               g.Tip_Avg,
               n.geom
    FROM       GeoEval g,
               nyc_neighborhood n
    WHERE      g.Neighborhood = n.NTAName
)

We can see our final desired output best in this screenshot:

Screen Shot 2017-11-03 at 3.56.11 PM.png

To output a map, we’ll use the /wms RESTful endpoint. Web Mapping Service is a popular standard for publishing map layers online.

By configuring the querying string parameters as documented, we can render map visualizations of our resulting table using a class-break renderer, which will allow us to specify value ranges and styling options for each range.

Here’s an example that’s been URL encoded:

http://<< host >>:9191/wms
    ?SERVICE=WMS
    &REQUEST=GetMap
    &FORMAT=image/png
    &TRANSPARENT=TRUE
    &VERSION=1.1.1
    &LAYERS=GeoEvalVis
    &WIDTH=1280
    &HEIGHT=600
    &SRS=EPSG:900913
    &BBOX=-8281291.4017365845,4970683.803714636,-8182916.946333695,5003704.599933787
    &DOSHAPES=true
    &SHAPELINEWIDTHS=0
    &SHAPEFILLCOLORS=e1ffdb,b5f4a8,68e268,1f961f,045104
    &STYLES=cb_raster
    &CB_ATTR=Tip_Avg
    &CB_VALS=0:1,1.01:2,2.01:3,3.01:4,4.01:11

Here’s the encoded URL call which you can paste into a browser (don’t forget to insert your host name for Kinetica).

http://<<host>>:9191/wms?SERVICE=WMS&REQUEST=GetMap&FORMAT=image/png&TRANSPARENT=TRUE&VERSION=1.1.1&LAYERS=GeoEvalVis&WIDTH=1287&HEIGHT=432&SRS=EPSG:900913&BBOX=-8281291.4017365845,4970683.803714636,-8182916.946333695,5003704.599933787&DOSHAPES=true&SHAPELINEWIDTHS=0&SHAPEFILLCOLORS=e1ffdb%2Cb5f4a8%2C68e268%2C1f961f%2C045104&STYLES=cb_raster&CB_ATTR=Tip_Avg&CB_VALS=0%3A1%2C1.01%3A2%2C2.01%3A3%2C3.01%3A4%2C4.01%3A11

The output is a color-coded PNG map layer showing average tips for each borough. If you know your NYC geography, you’ll be able to spot the pattern.

Screen Shot 2017-11-02 at 3.29.34 PM.png

The /wms/ map service works with the OGC specification, enabling these overlays to be used with OGC map viewing tools (e.g., Esri, Mapbox, GeoServer, and many other applications and API frameworks).

Below is an example of this same map layer being overlaid on maps within Esri.

Screen Shot 2017-11-02 at 3.57.28 PM.png

We can see South Jamaica and JFK Airport depicted in dark green on the map, representing the highest tips.

Summary

This tutorial has covered a lot of ground, but we hope it has given you a quick overview of what is possible with Kinetica’s spatial capabilities including map rendering. On GPU-accelerated servers, these types of large scale geospatial queries will run in a fraction of the time of traditional geospatial systems.

Kinetica Trial Feedback