By now you should have installed the product and have had the quick overview of the tools you can use to manage it and query data. In this tutorial, we’ll show you some of the ways to load data. The datasets being loaded will be used in future tutorials.
The three methods of importing you’ll learn how to use:
- Drag and Drop for quickly importing small sets of data
- Advanced Import with the Kinetica Input/Output (KIO) Tool for batch importing large sets of data with more control over the process
- Streaming ingest with the Active Analytics Workbench (AAW) for ingesting continuously updating data to feed views and applications
Drag and Drop
Kinetica allows you to drag and drop CSV, ORC, Apache Parquet, or Zip files (containing Shapefiles) to import the data into Kinetica. Drag-and-drop importing will attempt to import the file’s first record as a header. The file’s name will be used as the table’s name in Kinetica. The Drag-and-Drop documentation covers additional details and limitations.
We’re going to import, via drag-and-drop, a modified version of the
nyct2010.csv data file, which is the NYC Neighborhood Tabulation Areas (NTA) dataset. This file provides geospatial boundaries for neighborhoods in New York and has been modified to include a Kinetica-schema-style CSV header. The file can be downloaded using the following link: nyct2010.csv
Download and save the above CSV file locally to your disk. Now let’s import the data:
- Navigate to GAdmin (
- Click Data > Import.
- Drag the
nyct2010.csvfile from a file explorer window into the drop area of GAdmin. You can also click Choose File in GAdmin to select the
nyct2010.csvfile from your disk.
- Acknowledge the pop-up windows for Kinetica finding a Kinetica schema header and completing the import by clicking OK.
The data will begin importing. Once completed, click View Table to view the table or click Data > Table to see it in a listing with other tables & collections in the database.
Advanced Import with KIO
Kinetica also allows you to ingest data from various different sources, including Sybase IQ, Oracle, PostgreSQL, and AWS S3, using the KIO tool. This tutorial uses the GAdmin KIO user interface, also known as Advanced Import. Advanced Import is great for importing larger files and offers more control over the incoming schema. We’re going to load a historical NYC taxi dataset using an Apache Parquet file in a public AWS S3 bucket. You can read more about the full public data set, which contains taxi trip information, at nyc.gov.
To import the NYC taxi dataset using Advanced Import, first select the source and destination:
- Navigate to GAdmin (
- Click Data > Import.
- Click Advanced Import.
In the Source section:
- Select AWS S3 for the Datasource.
- Select Parquet for the Format.
- Input the following bucket File Path for the NYC taxi dataset:
- In the Target section, input
taxi_datafor the Table name. Leave the Collection blank and the Batch Size and Spark Options the default values.
Next, configure the columns of the target table that will be created:
- Click Configure Columns. The data will be analyzed and the projected column names, types, & sizes will be displayed.
- Click the Subtype drop-down associated with the vendor_id column, and choose char4.
- Click the Subtype drop-down associated with the store_and_fwd_flag column, and choose char1.
- Click the Subtype drop-down associated with the payment_type column, and choose char16.
Lastly, click Transfer Dataset.
The Transfer Status window appears to show data importing status. Once completed, click View Table to view the table.
Streaming Ingest with AAW
Kinetica can ingest streaming data via Kafka using the Active Analytics Workbench (AAW). We’re going to begin streaming in a current-day NYC Taxi dataset, which updates with taxi cab transactions continuously, to supplement the historical NYC taxi dataset you just loaded from a Parquet file. A public Kafka broker is available to serve this data.
Before we can create the streaming ingest, first we need to create a Kafka credential in AAW.
- Navigate to AAW (
- Click Security > Credentials.
- Click Add New Credential.
- Select Kafka from the Credential Type drop-down menu.
Quickstart Kafka Brokerfor the Name.
Public Kafka broker for the Kinetica Quick Startfor the Description.
quickstart.kinetica.com:9092for the Connection String.
nyctaxifor the Topic.
- Click Create.
Now that a Kafka credential has been created, we can use it to begin streaming in data.
- From AAW (
http://<kinetica-host>:8070/), click Data > Ingests.
- Click + Add New Ingest > New Streaming.
NYC Taxi Streaming Ingestfor the ingest Name.
Streaming ingest of NYC Taxi transactional data from public Kafka broker into Kineticafor the Description and click Next.
- Click Search next to Credentials and select the Quickstart Kafka Broker you created in the previous steps. Then, click Select.
- Under Destination, input
taxi_datafor the Table name and click Next.
- Review the Summary and click Create.
- Once on the Ingest Details page, click Start to begin the data ingestion.
- Enabling Multihead Ingest for fast data ingestion and fast record retrieval
- Using the Spark Connector to enable quick ingestion of large data sets and to stream data out of Kinetica.