Location Analytics

Kinetica brings your business and geospatial data together under one roof, giving you interactive location analysis at unprecedented scale. Kinetica enables you to continuously ingest IOT data from the field and perform complex geospatial analysis using our GPU-accelerated library of spatial type (ST) functions. Kinetica’s native graph server can even deliver real-time route optimization using pre-packaged graph analytics, when combined with open source or premium 3rd party road network data sets. Finally, Kinetica’s unique server-side rendering capability ensures that you can visualize data at any scale interactively on a map.

In this tutorial, we’ll first provide an overview of the location analytics capability of Kinetica. Later, we’ll continue using the NYC Taxi dataset to ask various questions about top pickup spots, average fare across vendors, and more. We’ll use spatial functions to join our taxi data with neighborhood information, eventually plotting some data on a map. All the queries in this tutorial can be run using the SQL tool in GAdmin. To access the SQL tool in GAdmin:

  1. Navigate to GAdmin (http://<kinetica-host>:8080/).
  2. Click Query > SQL.

Location Analytics Capabilities Overview

Kinetica’s location analytics capabilities are grounded in three main concepts:

  • Geospatial Functions Kinetica has many standardized geospatial functions (ST functions) to perform location analysis. Several are used in the tutorial.
  • Web Map Service (WMS) Through the REST API, WMS generates geospatial imagery from data contained within your tables — on demand — and overlays it on a basemap. Several WMS styles are available, including Raster, Classbreak Raster, Contour Plot, Heatmap, and Labels.
  • Network Graph Solvers Using your relational data, Kinetica can construct native graph data structures to produce shortest path or other pre-packaged graph analytics (A.K.A. solvers). Applications are wide-reaching and include use cases in transportation, utilities, social networking, and geospatial intelligence. Fundamentally, graphs are a set of vertices connected by edges, where edge weights and restrictions can be applied dynamically to produce an ideal path through or quantify some relationship between vertices.

Leveraging Location Analytics

The sections below outline leveraging location- and non-location-based data to answer realistic questions for several different consumers; these sections will demonstrate the breadth of Kinetica’s location analytics offerings.

Q1: What were the 10 most frequent destination neighborhoods for JFK pickups, and on average, what did it cost to go there?

Joining the NTA table and the taxi data using the STXY_Intersects function, you can group by neighborhood boundary to determine the total number of trips to the neighborhood and average fare for a trip from JFK to that neighborhood.

SELECT TOP 10
    n_dropoff.NTAName AS "Neighborhood",
    COUNT(*) AS "Total_Trips",
    DECIMAL(AVG(fare_amount)) AS "Average_Fare"
FROM
    taxi_data t
    JOIN nyct2010 n_pickup
        ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n_pickup.geom) = 1
        AND n_pickup.NTAName = 'Airport'
    JOIN nyct2010 n_dropoff
        ON STXY_Intersects(t.dropoff_longitude, t.dropoff_latitude, n_dropoff.geom) = 1
WHERE pickup_datetime < '2019-01-01'
GROUP BY 1
ORDER BY 2 DESC;

Output:

+----------------------------------------------+---------------+----------------+
| Neighborhood                                 |   Total_Trips |   Average_Fare |
+----------------------------------------------+---------------+----------------+
| Midtown-Midtown South                        |          3219 |        39.2489 |
| Turtle Bay-East Midtown                      |          1250 |        36.2404 |
| Hudson Yards-Chelsea-Flatiron-Union Square   |           757 |        39.4635 |
| Murray Hill-Kips Bay                         |           675 |        37.3777 |
| Battery Park City-Lower Manhattan            |           629 |        43.5058 |
| Upper West Side                              |           584 |        37.2917 |
| Clinton                                      |           580 |        41.7162 |
| West Village                                 |           493 |        40.4975 |
| Upper East Side-Carnegie Hill                |           459 |        35.0270 |
| SoHo-TriBeCa-Civic Center-Little Italy       |           455 |        41.9072 |
+----------------------------------------------+---------------+----------------+

Q2: For neighborhoods with more than 500 pickups, how many had more than half of their pickups at night and by what taxi vendor?

You can use neighborhood boundary data and pickup time to calculate in which NTA vendors were picking up passengers at night and display the results in a pivot table containing columns of percentages for all vendors and each individual vendor.

SELECT
    NTANAme AS "Neighborhood",
    all_npp AS "All",
    cmt_npp AS "CMT", nyc_npp AS "NYC", vts_npp AS "VTS", ycab_npp AS "YCAB"
FROM
(
    SELECT
        NTAName,
        IF (GROUPING(vendor_id) = 1, 'ALL', vendor_id) AS vendor_name,
        COUNT(*) AS total_pickups,
        DECIMAL(SUM(IF(HOUR(pickup_datetime) BETWEEN 5 AND 19, 0, 1)))
            / COUNT(*) * 100 AS night_pickup_percentage
    FROM
        taxi_data t
        JOIN nyct2010 n
            ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n.geom) = 1
		WHERE pickup_datetime < '2019-01-01'
    GROUP BY
        NTAName,
        ROLLUP(vendor_id)
)
PIVOT
(
    MAX(total_pickups) AS tp,
    MAX(night_pickup_percentage) AS npp
    FOR vendor_name IN ('ALL', 'CMT', 'NYC', 'VTS', 'YCAB')
)
WHERE all_tp > 500 AND all_npp > 50
ORDER BY all_npp DESC;

Output:

+-------------------------+-----------+-----------+-----------+-----------+-----------+
| Neighborhood            |       All |       CMT |       NYC |       VTS |      YCAB |
+-------------------------+-----------+-----------+-----------+-----------+-----------+
| East Williamsburg       |   83.1400 |   84.2100 |   81.2500 |   85.2600 |   83.2300 |
| North Side-South Side   |   78.6400 |   80.7500 |   78.7300 |   82.0000 |   74.6600 |
| Park Slope-Gowanus      |   66.7700 |   72.0000 |   67.6800 |   71.2700 |   58.9600 |
| Chinatown               |   64.2100 |   59.1700 |   66.9800 |   64.1300 |   64.7400 |
| Fort Greene             |   60.6200 |   56.2500 |   63.2500 |   62.8000 |   58.1200 |
| East Village            |   56.2800 |   57.6000 |   56.1700 |   55.1900 |   56.7700 |
| Lower East Side         |   52.6500 |   53.7100 |   54.1300 |   50.3300 |   52.9400 |
+-------------------------+-----------+-----------+-----------+-----------+-----------+

Q3: How many pickups per hour were there at JFK International Airport?

Using the STXY_Intersects function to determine which pickup points were located in the JFK International Airport neighborhood boundary, it’s simple to calculate the number of pickups per hour there were at JFK for all cab types.

SELECT
    RPAD(LPAD(CHAR2(HOUR(t.pickup_datetime)), 2, '0'), 5, ':00') AS "Pickup_Hour",
    COUNT(*) AS "Total_Pickups"
FROM
    taxi_data t
    JOIN nyct2010 n
        ON STXY_Intersects(t.pickup_longitude, t.pickup_latitude, n.geom) = 1
WHERE
    NTAName = 'Airport' AND
		pickup_datetime < '2019-01-01'
GROUP BY
    HOUR(t.pickup_datetime)
ORDER BY
    1;

Output:

+---------------+-----------------+
| Pickup_Hour   |   Total_Pickups |
+---------------+-----------------+
| 00:00:00      |             662 |
| 01:00:00      |             270 |
| 02:00:00      |              91 |
| 03:00:00      |              40 |
| 04:00:00      |              24 |
| 05:00:00      |             180 |
| 06:00:00      |             344 |
| 07:00:00      |             472 |
| 08:00:00      |             617 |
| 09:00:00      |             652 |
| 10:00:00      |             746 |
| 11:00:00      |             816 |
| 12:00:00      |             832 |
| 13:00:00      |             905 |
| 14:00:00      |            1040 |
| 15:00:00      |            1156 |
| 16:00:00      |            1181 |
| 17:00:00      |            1249 |
| 18:00:00      |            1110 |
| 19:00:00      |            1134 |
| 20:00:00      |            1172 |
| 21:00:00      |            1284 |
| 22:00:00      |            1199 |
| 23:00:00      |            1102 |
+---------------+-----------------+

Visualization

One of Kinetica’s most powerful and accessible features is its ability to visualize your data in convenient dashboards or back your own custom applications. Kinetica’s main visualization tools are Reveal and the Web Mapping Service (WMS). Reveal is covered in-depth in Reveal Analytics section.

Web Map Service

Kinetica’s web map service implements the OGC specification, enabling overlays to be used with OGC/OGS map viewing tools like Esri, GeoServer, and Open Layers. Simply point your basemap provider to our /wms endpoint with your desired rendering parameters, and Kinetica will leverage the power of the GPU to generate geospatial imagery on demand. Kinetica produces transparent PNGs that are automatically overlaid on your desired basemap provider, and enable data visualization at a scale that traditional client-side tools cannot manage.

The WMS is capable of rendering geospatial data using a variety of styles:

  • Raster Renders every point or shape on the map.
  • Classbreak Raster Categorizes points and shapes by color using explicit values or ranges of values.
  • Heatmap Renders multi-colored gradients representing point density or the value of a given column.
  • Contour Renders digital elevation model (DEM) data on a map using a single or a gradient of colors. Labels may be optionally applied.
  • Labels Render labels on the map dynamically using column data and a variety of fonts.

If you are developing custom geospatial applications, Kinetica’s REST and JavaScript APIs make it easy to analyze your data from the browser. We also offer a Mapbox code-acceleration library called Kickbox.js that initializes connections between a basemap and Kinetica in fewer than 25 lines of code, and provides a simple and intuitive API to manage your Kinetica WMS layers in Mapbox.

To give you an idea of what to expect when using WMS, we’ve outlined another query below that utilizes the service.

Q4: Given my storefront’s location, how can I calculate the number of dropoffs that occurred within 150 meters?

Say you own a small business or restaurant in Chelsea that relies on nearby foot traffic for new customers. You can easily visualize the density of taxi dropoffs nearby with the help of the GEODIST function.

CREATE OR REPLACE TABLE store_front_dropoffs AS
SELECT *
FROM taxi_data
WHERE GEODIST(-74.00378, 40.743193, dropoff_longitude, dropoff_latitude) < 150;

We used this query to create a table so the results can be seen on a map. To output the dropoff points to a map, we’ll use the WMS functionality of GAdmin. Let’s validate this query:

  1. Navigate to GAdmin (http://<kinetica-host>:8080/).
  2. Click Data.
  3. Next to the store_front_dropoffs table, click Map. The output is a PNG map layer showing the dropoff points within 150 meters of your storefront.
  4. Zoom into the New York city part of the map.

Kinetica Trial Feedback