Data Sources

A data source is reference object for a data set that is external to the database. It consists of the location & connection information to that external source, but doesn't hold the names of any specific data sets/files within that source. A data source can make use of a credential object for storing remote authentication information.

A data source name must adhere to the standard naming criteria. Each data source exists within a schema and follows the standard name resolution rules for tables.

The following data source providers are supported:

Azure (Microsoft blob storage)
CData (CData Software source-specific JDBC driver; see driver list for the full list of supported JDBC drivers)
GCS (Google Cloud Storage)
HDFS (Apache Hadoop Distributed File System)
JDBC (Java Database Connectivity, using a user-supplied driver)
Kafka (streaming feed)
- Apache
- Confluent
S3 (Amazon S3 Bucket)

Note

The following default hosts are used for Azure & S3, but can be overridden in the location parameter:

Azure: <service_account_name>.blob.core.windows.net
S3: <region>.amazonaws.com

Data sources perform no function by themselves, but act as proxies for accessing external data when referenced in certain database operations. The following can make use of data sources:

External tables (see also the CREATE EXTERNAL TABLE command in SQL)
Insert records (from files) API calls (see also the LOAD DATA command in SQL)

Individual files within a data source need to be identified when the data source is referenced within these calls.

Note

The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.
CData data sources can use a JDBC credential for authentication.

Managing Data Sources

A data source can be managed using the following API endpoint calls. For managing data sources in SQL, see CREATE DATA SOURCE.

API Call	Description
/create/datasource	Creates a data source, given a location and connection information
/alter/datasource	Modifies the properties of a data source, validating the new connection
/drop/datasource	Removes the data source reference from the database; will not modify the external source data
/show/datasource	Outputs the data source properties; passwords are redacted
/grant/permission/datasource	Grants the permission for a user to connect to a data source
/revoke/permission/datasource	Revokes the permission for a user to connect to a data source

Creating a Data Source

To create a data source, kin_ds, that connects to an Amazon S3 bucket, kinetica_ds, in the US East (N. Virginia) region, in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = 'kin_ds',
    location = 's3',
    user_name = aws_id,
    password = aws_key,
    options = {
        's3_bucket_name': 'kinetica-ds',
        's3_region': 'us-east-1'
    }
)

Important

For Amazon S3 connections, the user_name & password parameters refer to the AWS Access ID & Key, respectively.

Provider-Specific Syntax

Several authentication schemes across multiple providers are supported.

Azure BLOB

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        'azure_container_name': '<azure container name>'
    }
)

Public (No Auth)



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '',
    options = {
        'azure_container_name': '<azure container name>'
    }
)

Password



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '<azure storage account key>',
    options = {
        'azure_container_name': '<azure container name>'
    }
)

SAS Token



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '',
    options = {
        'azure_sas_token': '<azure sas token>',
        'azure_container_name': '<azure container name>'
    }
)

Active Directory



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<ad client id>',
    password = '<ad client secret key>',
    options = {
        'azure_storage_account_name': '<azure storage account name>',
        'azure_container_name': '<azure container name>',
        'azure_tenant_id': '<azure tenant id>'
    }
)

CData

Credential



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = '<cdata jdbc url>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>'
    }
)

Password in URL



1
2
3
4
5
6
7


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<cdata jdbc url with username/password>',
    user_name = '',
    password = '',
    options = {}
)

Password as Parameter



1
2
3
4
5
6
7


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<cdata jdbc url>',
    user_name = '<jdbc username>',
    password = '<jdbc password>',
    options = {}
)

GCS

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)

Public (No Auth)



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)

User ID & Key



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '<gcs account id>',
    password = '<gcs account private key>',
    options = {
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)

JSON Key



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        'gcs_service_account_keys': '<gcs account json key text>',
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)

HDFS

Credential



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>'
    }
)

Password



1
2
3
4
5
6
7


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '<hdfs password>',
    options = {}
)

Kerberos Token



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '',
    options = {
        'hdfs_use_kerberos': 'true'
    }
)

Kerberos Keytab



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '',
    options = {
        'hdfs_kerberos_keytab': 'kifs://<keytab file/path>'
    }
)

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = '<jdbc url>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        'jdbc_driver_class_name' = '<jdbc driver class full path>',
        'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
    }
)

Password



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<jdbc url>',
    user_name = '<jdbc username>',
    password = '<jdbc password>',
    options = {
        'jdbc_driver_class_name' = '<jdbc driver class full path>',
        'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
    }
)

Kafka (Apache)

Credential
Public (No Auth)

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'kafka://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<kafka credential name>',
        'kafka_topic_name': '<kafka topic name>'
    }
)

Public (No Auth)



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'kafka://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'kafka_topic_name': '<kafka topic name>'
    }
)

Kafka (Confluent)

Credential
Public (No Auth)

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'confluent://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<kafka credential name>',
        'kafka_topic_name': '<kafka topic name>'
    }
)

Public (No Auth)



1
2
3
4
5
6
7
8
9


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'confluent://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'kafka_topic_name': '<kafka topic name>'
    }
)

S3 (Amazon)

Credential



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)

Public (No Auth)



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '',
    password = '',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)

Access Key



 1
 2
 3
 4
 5
 6
 7
 8
 9
10


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '<aws access key id>',
    password = '<aws secret access key>',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)

IAM Role



 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '<aws access key id>',
    password = '<aws secret access key>',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>',
        's3_aws_role_arn': '<amazon resource name>'
    }
)

Limitations

Azure anonymous data sources are only supported when both the container and the contained objects allow anonymous access.
HDFS systems with wire encryption are not supported.
Kafka data sources require an associated credential object for authentication.