Skip to content
Kinetica As A Service - Now available on Azure. Learn More

Spatial Analytics: Optimizing Graphs on Geospatial Features

For Things That Matter: Enjoyable Scenic Drives

Modern mapping tools now give us options — avoiding tolls, avoiding highways, etc. How do they do it, and how could we build our own routing system that optimizes on the factors we deem important? Perhaps we want the most scenic route home, or the most well-lit?

Follow this demo to model road networks as graphs, and find the routes best suited for you.

The full notebook is here:

Table of Contents:

Problem Setup

We set this problem up as a graph optimization with edge weights that favor the traversal of streets with our attribute of choice. In this exercise we choose road beauty, since scenic drives are something most are familiar with. Later in the notebook, we explore the “Eye of the Beholder” concept — that beauty itself is of course subjective; and we propose an easy way for individuals to calculate beauty scores important to themselves personally.

Caveats & Simplifying Assumptions

  • We sample road images about 100ft apart, but the user is free to sample at any frequency they want, assuming they have inference compute capacity. For rural routes, 100ft is probably too dense. For urban routes, 100ft could be too little (rely on Mapillary for these, that will provide more density.)
  • We took photos with a variety of cameras — iPhones, GoPros, crowdsourced from Mapillary. This can affect scores since wide-angle cameras will capture more sky (and thus overweight sky scores). We demonstrate the concept, but the user is free to homogenize source images for even more consistent scoring.
  • We ignore direction of travel vs image heading, though this could have a slight impact, especially in urban environments and on mountainous roads (where one side is mostly sky and the other is mostly mountain)
  • We personally sampled a variety of environments — urban (Brooklyn NY; Arlington VA; Washington DC), suburban (Ithaca NY; UC Berkeley; Princeton NJ), and rural (Shanendoah Mountain Regions in VA). We also pulled multiple metro areas from Mapillary for greater scale.
  • Larger inference workloads were done on 4x NVIDIA v100 — but CPUs will also suffice

Getting Started

To get started immediately, you can start up the Kinetica Developer Edition easily run as a Docker Container, or by starting an instance using Kinetica Cloud

Then, import this repo from GitHub:

    git clone
    cd saferun

Set up python dependencies:

    pip install -r requirements.txt

Export your credentials as environment variables and the the Jupyter notebook will lead you from start to finish!

    export KINETICA_HOST=''
    export KINETICA_USER='kadmin'
    export KINETICA_PASS='xyz'
    jupyter notebook

In [1]:

import os
import sys
import json
import gpudb

We’ll be interacting with Kinetica along the way to load data, set up graph optimziations, and view results. Make sure to export these environment variables (or override them below): KINETICA_HOST, KINETICA_USER, KINETICA_PASS. All the code below is Python, but many are SQL commands excuted via Python directly against the database you connect to below. In [2]:

KINETICA_HOST = os.getenv('KINETICA_HOST', "http://localhost:9191")
KINETICA_USER = os.getenv('KINETICA_USER', "kadmin")
db = gpudb.GPUdb(host=KINETICA_HOST, username=KINETICA_USER, password=KINETICA_PASS)
exec_result = db.execute_sql_and_decode('show system properties')['status_info']['status']

We dont have to do this, but having all our assets under a single schema is useful to organize our work. It also allows us to quickly clear out project assets. In [3]:

db.create_schema("pretty_trip", options={'no_error_if_exists': "true"})


{'schema_name': 'pretty_trip',
 'info': {},
 'status_info': {'status': 'OK',
  'message': '',
  'data_type': 'create_schema_response',
  'response_time': 0.0137}}

Required Data

We will be using the Open Street Maps (OSM) dataset to drive our optimizations:

Lets start with creating the target tables and loading our CSVs into them: In [ ]:

exec_result = db.execute_sql("""
CREATE OR REPLACE TABLE "pretty_trip"."dc_osm"
   "osm_id" BIGINT NOT NULL,
   "fclass" VARCHAR (64, dict) NOT NULL,
   "name" VARCHAR (256),
   "ref" VARCHAR (128, dict),
   "oneway" VARCHAR (32) NOT NULL,
   "maxspeed" INTEGER NOT NULL,
   "layer" INTEGER NOT NULL,
   "bridge" VARCHAR (32) NOT NULL,
   "tunnel" VARCHAR (32) NOT NULL

In [ ]:

exec_result = db.execute_sql("""
CREATE OR REPLACE TABLE "pretty_trip"."location_beauty"
   "longitude" DOUBLE NOT NULL,
   "latitude" DOUBLE NOT NULL,
   "beauty" DOUBLE NOT NULL

Open Street Maps data needs to be converted to tabular form to load as a graph. You can do this wrangling yourself, but we’ve also done it for you…in case you wish to skip that step. We hosted the wrangled street maps data on S3. Below we regiser our S3 bucket as a data source on Kinetica and then load the CSVs into the table we just created above. In [ ]:

exec_result = db.execute_sql("""
CREATE OR REPLACE DATA SOURCE "pt-road-network-osm"
    ANONYMOUS = 'true',
    BUCKET NAME = 'kinetica-community',
    REGION = 'us-east-1'

In [ ]:

exec_result = db.execute_sql("""
LOAD DATA INTO "pretty_trip"."dc_osm"
FROM FILE PATHS 'pretty-trip/dc_nova__road_network.csv'
    DATA SOURCE = 'pt-road-network-osm'

The data just loaded is shown below. The key feature, of course, is the series of coordinate pairs forming the graph network.

Side by Side Comparison

However, the data is best visualized as a WMS:

Side by Side Comparison


Prep Streetlight Data

First we must convert the street light points into polygons inorder to create buffers around the lights, which represents their lightspan. For this project we have estimated 9 meters as our buffer radius, which is the average height of all of the streetlights in this data set.


First we will create the graph with weights. Our weights are distance based (not time based). In [ ]:

exec_result = db.create_graph(
    graph_name = "dc_osm_graph", 
    directed_graph = False,
    edges = [
    "pretty_trip.osm_small.osm_id AS EDGE_ID",
    "pretty_trip.osm_small.WKT AS EDGE_WKTLINE"
    weights = [
    "pretty_trip.osm_small.osm_id AS WEIGHTS_EDGE_ID",
    "ST_Length(pretty_trip.osm_small.WKT,1)/(ST_NPoints(pretty_trip.osm_small.WKT)-1) + ((1- pretty_trip.osm_small.light)*20) AS WEIGHTS_VALUESPECIFIED"
    options = {
    "merge_tolerance": "0.00001",
    "use_rtree": "false",
    "min_x": "-180",
    "max_x": "180",
    "min_y": "-90",
    "max_y": "90",
    "recreate": "true",
    "modify": "false",
    "export_create_results": "false",
    "enable_graph_draw": "true",
    "save_persist": "false",
    "sync_db": "false",
    "add_table_monitor": "false",
    "graph_table": "ki_home.dc_osm_graph_table",
    "add_turns": "false",
    "turn_angle": "60.0",
    "is_partitioned": "false"

For location images, you have several options for images:

  • Take your own photos and ingest them
  • Use the Mapillary ingestor (DockerHub: kinetica/ctnr-kml-byoc-sample-mapillary) and batch ingest for areas of choice
  • Use our dataset (if you want to quickly run thru this exercise)

And finally, for beauty scoring, you have several choices:

  • Load your own scores (from any source of choice!)
  • Use the ready-deployable AAW container for scoring:
    • DockerHub: CPU: kinetica/ctnr-kml-bbox-ml-deep-img-segment-cpu
    • DockerHub: GPU: kinetica/ctnr-kml-bbox-ml-deep-img-segment-gpu
  • Use our dataset of already-inferenced images (if you want to quickly run thru this exercise, see all_inferences_scored.json)

Below, we use container results In [ ]:

import json
from collections import OrderedDict

with open("all_inferences_scored.json", "r") as read_file:
    all_pts = json.load(read_file)
pts = [OrderedDict([('latitude', k['location']["gps_latitude_decimal"]), 
                    ('longitude', k['location']["gps_longitude_decimal"]),
                    ('beauty', k["score"])]) for k in all_pts]
db.insert_records(table_name="pretty_trip.location_beauty", data=pts)

In [ ]:

exec_result = db.solve_graph(
    graph_name = "dc_osm_graph",
    solver_type = "SHORTEST_PATH",
    source_nodes = [
    "{'POINT(-77.037124 38.926142)'} AS NODE_WKTPOINT"
    destination_nodes = [
    "{'POINT(-77.042686 38.922676)'} AS NODE_WKTPOINT"
    solution_table = "pretty_trip.scenic_path_solved",
    "export_solve_results": "false",
    "min_solution_radius": "0.0",
    "max_solution_radius": "0.0",
    "max_solution_targets": "0",
    "accurate_snaps": "true",
    "left_turn_penalty": "0.0",
    "right_turn_penalty": "0.0",
    "intersection_penalty": "0.0",
    "sharp_turn_penalty": "0.0",
    "output_edge_path": "false",
    "output_wkt_path": "true"


Below is a routing example from Southern Washington DC to Kinetica’s HQ in Northern Virginia. We also have a sample of images and scores along the path which contributed to the different routing results.

Here is a before and after of the generated path based on the wieghts.shortest.png


Taking it Further

Do you like winding roads? We can easily identify winding roads on images by road/lane tracking using CV techniques. You can try it on this dataset from Skyline Drive in Shanendoah Virginia:

But are inferencing images the best way to find winding roads? Image inference and CV is costly, and a better way may be to calculate angles on subsequent road segments. You can do this directly with the Open Street Maps dataset we used above.

A similar, but easier exercise would be identifying drives with rolling hills and steady up-and-down country roads. This is because elevation data is easily join-able. You can obtain elevation data from the US Geological Survey (USGS: Whatever you do not have, you can interpolate and generate isochrones (see: See if you can eyeball the results against this beautiful drive in Bedford Pennsylvania:

Other Applications

  • Autonomous Vehicles: Roads with fewer pedestrians
  • Government: Crowd-source route data on potholes
  • Government: Real-time crowd-source routes with blockages (fallen trees, debris, fallen branches)
  • Government: Municipality and State contributed data to optimize mapping routes for tourists
  • P&C Insurance: Smart-route commercial traffic to minimize risk (Opt in)
  • P&C Insurance: Score individual traffic for risk aversion based on routes, issue policy discounts (Opt in)
  • Rental Car Companies: Score individual rentals for safety scored discounts
  • Waze / Google Maps: Offer a more scenic route w/ similar ETA; offer construction avoidance
  • Zillow / Trulia / Redfin: Beyond zestimates, commute beauty scoring
  • Cyclers: Avoid steep grades; avoid roads with twigs/branches on path
  • Pedestrians: Avoid major road crossings; quiet routes; beautiful roads
  • Hikers: Chained trails with minimum/maximum difficulty grade
  • Hikers: Optimal loop size from trailhead
  • Bikers: Safe and scenic bike loops that avoids large roadways


About Us

This demo was put together with thanks to Scott Little, Julian Jenkins, Kaan Karamete, Pat Khunachak, and Chad Meley

Try it on Kinetica – an analytics database for fusing data across streams and data lakes to unlock value from spatial and temporal data at scale and speed. Learn about us here.

Contact Us


The software is licensed under the MIT license.

You've made it this far, share it on:

White Paper

Build Real-time Location Applications on Massive Datasets

Vectorization opens the door for fast analysis of large geospatial datasets
Download the white paper