Data Sources

A data source is reference object for a data set that is external to the database. It consists of the location & connection information to that external source, but doesn't hold the names of any specific data sets/files within that source. A data source can make use of a credential object for storing remote authentication information.

A data source name must adhere to the standard naming criteria. Each data source exists within a schema and follows the standard name resolution rules for tables.

The following data source providers are supported:

  • Azure (Microsoft blob storage)
  • CData (CData Software source-specific JDBC driver; see driver list for the full list of supported JDBC drivers)
  • GCS (Google Cloud Storage)
  • HDFS (Apache Hadoop Distributed File System)
  • JDBC (Java Database Connectivity, using a user-supplied driver)
  • Kafka (streaming feed)
    • Apache
    • Confluent
  • S3 (Amazon S3 Bucket)

Note

The following default hosts are used for Azure & S3, but can be overridden in the location parameter:

  • Azure: <service_account_name>.blob.core.windows.net
  • S3: <region>.amazonaws.com

Data sources perform no function by themselves, but act as proxies for accessing external data when referenced in certain database operations. The following can make use of data sources:

Individual files within a data source need to be identified when the data source is referenced within these calls.

Note

  • The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.
  • CData data sources can use a JDBC credential for authentication.

Managing Data Sources

A data source can be managed using the following API endpoint calls. For managing data sources in SQL, see CREATE DATA SOURCE.

API Call Description
/create/datasource Creates a data source, given a location and connection information
/alter/datasource Modifies the properties of a data source, validating the new connection
/drop/datasource Removes the data source reference from the database; will not modify the external source data
/show/datasource Outputs the data source properties; passwords are redacted
/grant/permission/datasource Grants the permission for a user to connect to a data source
/revoke/permission/datasource Revokes the permission for a user to connect to a data source

Creating a Data Source

To create a data source, kin_ds, that connects to an Amazon S3 bucket, kinetica_ds, in the US East (N. Virginia) region, in Python:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = 'kin_ds',
    location = 's3',
    user_name = aws_id,
    password = aws_key,
    options = {
        's3_bucket_name': 'kinetica-ds',
        's3_region': 'us-east-1'
    }
)

Important

For Amazon S3 connections, the user_name & password parameters refer to the AWS Access ID & Key, respectively.

Provider-Specific Syntax

Several authentication schemes across multiple providers are supported.

Azure BLOB

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        'azure_container_name': '<azure container name>'
    }
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '',
    options = {
        'azure_container_name': '<azure container name>'
    }
)
Password
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '<azure storage account key>',
    options = {
        'azure_container_name': '<azure container name>'
    }
)
SAS Token
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<azure storage account name>',
    password = '',
    options = {
        'azure_sas_token': '<azure sas token>',
        'azure_container_name': '<azure container name>'
    }
)
Active Directory
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'azure[://<host>]',
    user_name = '<ad client id>',
    password = '<ad client secret key>',
    options = {
        'azure_storage_account_name': '<azure storage account name>',
        'azure_container_name': '<azure container name>',
        'azure_tenant_id': '<azure tenant id>'
    }
)

CData

Credential
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = '<cdata jdbc url>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>'
    }
)
Password in URL
1
2
3
4
5
6
7
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<cdata jdbc url with username/password>',
    user_name = '',
    password = '',
    options = {}
)
Password as Parameter
1
2
3
4
5
6
7
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<cdata jdbc url>',
    user_name = '<jdbc username>',
    password = '<jdbc password>',
    options = {}
)

GCS

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)
Public (No Auth)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)
User ID & Key
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '<gcs account id>',
    password = '<gcs account private key>',
    options = {
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)
JSON Key
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'gcs[://<host>]',
    user_name = '',
    password = '',
    options = {
        'gcs_service_account_keys': '<gcs account json key text>',
        ['gcs_project_id': '<gcs project id>',]
        'gcs_bucket_name': '<gcs bucket name>'
    }
)

HDFS

Credential
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>'
    }
)
Password
1
2
3
4
5
6
7
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '<hdfs password>',
    options = {}
)
Kerberos Token
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '',
    options = {
        'hdfs_use_kerberos': 'true'
    }
)
Kerberos Keytab
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'hdfs://<host>:<port>',
    user_name = '<hdfs username>',
    password = '',
    options = {
        'hdfs_kerberos_keytab': 'kifs://<keytab file/path>'
    }
)

JDBC

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = '<jdbc url>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        'jdbc_driver_class_name' = '<jdbc driver class full path>',
        'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
    }
)
Password
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = '<jdbc url>',
    user_name = '<jdbc username>',
    password = '<jdbc password>',
    options = {
        'jdbc_driver_class_name' = '<jdbc driver class full path>',
        'jdbc_driver_jar_path' = 'kifs://<jdbc driver jar path>'
    }
)

Kafka (Apache)

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'kafka://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<kafka credential name>',
        'kafka_topic_name': '<kafka topic name>'
    }
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'kafka://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'kafka_topic_name': '<kafka topic name>'
    }
)

Kafka (Confluent)

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 'confluent://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<kafka credential name>',
        'kafka_topic_name': '<kafka topic name>'
    }
)
Public (No Auth)
1
2
3
4
5
6
7
8
9
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 'confluent://<host>:<port>',
    user_name = '',
    password = '',
    options = {
        'kafka_topic_name': '<kafka topic name>'
    }
)

S3 (Amazon)

Credential
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<data source schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '',
    password = '',
    options = {
        'credential': '[<credential schema name>.]<credential name>',
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)
Public (No Auth)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '',
    password = '',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)
Access Key
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '<aws access key id>',
    password = '<aws secret access key>',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>'
    }
)
IAM Role
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
h_db.create_datasource(
    name = '[<schema name>.]<data source name>',
    location = 's3[://<host>]',
    user_name = '<aws access key id>',
    password = '<aws secret access key>',
    options = {
        's3_bucket_name': '<aws s3 bucket name>',
        's3_region': '<aws s3 region>',
        's3_aws_role_arn': '<amazon resource name>'
    }
)

Limitations

  • Azure anonymous data sources are only supported when both the container and the contained objects allow anonymous access.
  • HDFS systems with wire encryption are not supported.
  • Kafka data sources require an associated credential object for authentication.