Skip to content

Importing Data

In this tutorial, we’ll show you some of the ways to load data. The datasets being loaded will be used in future tutorials.

The two methods of importing you’ll learn how to use:

  • Drag and Drop
  • Advanced Import UI
  • Streaming ingest with the Active Analytics Workbench (AAW) for ingesting continuously updating data to feed views and applications

Drag and Drop

Kinetica allows you to drag and drop CSV, ORC, Apache Parquet, or Zip files (containing Shapefiles) to import the data into Kinetica. Drag-and-drop importing will attempt to import the file’s first record as a header. The file’s name will be used as the table’s name in Kinetica. The Drag-and-Drop documentation covers additional details and limitations.

We’re going to import, via drag-and-drop, a modified version of the nyct2010.csv data file, which is the NYC Neighborhood Tabulation Areas (NTA) dataset. This file provides geospatial boundaries for neighborhoods in New York and has been modified to include a Kinetica-schema-style CSV header. The file can be downloaded using the following link: nyct2010.csv

Download and save the above CSV file locally to your disk. Now let’s import the data:

  1. Navigate to GAdmin (http://<kinetica-host>:8080/)
  2. Click Data > Import.
  3. Drag the nyct2010.csv file from a file explorer window into the drop area of GAdmin. You can also click Choose File in GAdmin to select the nyct2010.csv file from your disk.
  4. Acknowledge the pop-up windows for Kinetica finding a Kinetica schema header and completing the import by clicking OK.

The data will begin importing. Once completed, click View Table to view the table or click Data > Table to see it in a listing with other tables & collections in the database.

Advanced Import UI

By using a simple user interface, Kinetica can import data sets stored in external sources like Amazon S3. In GAdmin, navigate to Data > Import > Advanced Import. On this screen, you are able to select a source and destination for your data. On the left-hand form, set the fields like this:

  • Datasource: AWS S3
  • Format: Parquet
  • AWS Access Key ID:
  • AWS Secret Access Key:
  • File Path: kinstart/taxi_data.parquet

On the right-hand form, set the fields like this:

  • Datasource: Kinetica
  • Batch Size: 100000
  • Table: taxi_data_historical
  • HTTPS: (Default)
  • Connect Timeout (ms): 60000 (Default)
  • Network Timeout: 800 (Default)
  • Enable Multi-head: (Default)
  • Update Existing PK: (Default)
  • Replicated: (Default)
  • Driver Memory: 2GB (Default)
  • Executor Memory: 2GB (Default)
  • Off Heap Memory: 4GB (Default)

Press the Configure Columns button and wait for the modal to appear. It may take a minute, as Kinetica will attempt to infer the column types of the file. For the vendor_id column, click on the Subtype dropdown and select char4.

Now press the Transfer Dataset button. This may take several minutes to ingest all 500k records into your system. When the transfer is complete, you should see the new table in the ki_home schema.

Streaming Ingest with AAW

Kinetica can ingest streaming data via Kafka using the Active Analytics Workbench (AAW). We’re going to begin streaming in a current-day NYC Taxi dataset, which updates with taxi cab transactions continuously, to supplement the historical NYC taxi dataset you just loaded from a Parquet file. A public Kafka broker is available to serve this data.

Before we can create the streaming ingest, first we need to create a Kafka credential in AAW.

  1. Navigate to AAW (http://<kinetica-host>:8070/)
  2. Click Security > Credentials.
  3. Click Add New Credential.
  4. Select Kafka from the Credential Type drop-down menu.
  5. Input Quickstart Kafka Broker for the Name.
  6. Input Public Kafka broker for the Kinetica Quick Start for the Description.
  7. Input quickstart.kinetica.com:9092 for the Connection String.
  8. Input nyctaxi for the Topic.
  9. Click Create.

Now that a Kafka credential has been created, we can use it to begin streaming in data.

  1. From AAW (http://<kinetica-host>:8070/), click Data > Ingests.
  2. Click + Add New Ingest > New Streaming.
  3. Input NYC Taxi Streaming Ingest for the ingest Name.
  4. Input Streaming ingest of NYC Taxi transactional data from public Kafka broker into Kinetica for the Description and click Next.
  5. Click Search next to Credentials and select the Quickstart Kafka Broker you created in the previous steps. Then, click Select.
  6. Under Destination, input taxi_data_streaming for the Table name and click Next.
  7. Review the Summary and click Create.
  8. Once on the Ingest Details page, click Start to begin the data ingestion.


 

You now have data streaming into your Kinetica streaming data warehouse! Continue on to learn how to query and visualize this data.

Kinetica Trial Feedback