A data source is reference object for a data set that is external to the database. It consists of the location & connection information to that external source, but doesn't hold the names of any specific data sets/files within that source.
The following data source types are supported:
Note
The following hosts are used for each of the data source providers:
<service_account_name>.blob.core.windows.net
location
parameter<region>.amazonaws.com
Data sources perform no function by themselves, but act as proxies for accessing external data when referenced in certain database operations. The following operations can make use of data sources:
Individual files within a data source need to be identified when the data source is referenced within these calls.
Note
The data source will be validated upon creation, by default, and will fail to be created if an authorized connection cannot be established.
A data source can be managed using the following API endpoint calls. For managing data sources in SQL, see CREATE DATA SOURCE.
API Call | Description |
---|---|
/create/datasource | Creates a data source, given a location and connection information |
/alter/datasource | Modifies the properties of a data source, validating the new connection |
/drop/datasource | Removes the data source reference from the database; will not modify the external source data |
/show/datasource | Outputs the data source properties, for users with system_admin permission; users with connect permission will see only the names & providers for the data sources to which they have access |
/grant/permission/datasource | Grants the permission for a user to connect to a data source |
/revoke/permission/datasource | Revokes the permission for a user to connect to a data source |
To create a data source, kin_ds
, that connects to an Amazon S3 bucket,
kinetica_ds
, in the US East (N. Virginia) region, in Python:
h_db.create_datasource(
name = 'kin_ds',
location = 's3',
user_name = aws_id,
password = aws_key,
options = {
's3_bucket_name': 'kinetica-ds',
's3_region': 'us-east-1'
}
)
Important
For Amazon S3 connections, the user_name
& password
parameters refer to the AWS Access ID & Key, respectively.
Several authentication schemes across multiple providers are supported.
h_db.create_datasource(
name = '<data source name>',
location = 'azure',
user_name = '<azure storage account name>',
password = '<azure storage account key>',
options = {
'azure_container_name': '<azure container name>'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 'azure',
user_name = '<azure storage account name>',
password = '',
options = {
'azure_container_name': '<azure container name>',
'azure_sas_token': '<azure sas token>'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 'azure',
user_name = '<azure storage account name>',
password = '',
options = {
'azure_container_name': '<azure container name>',
'azure_oauth_token': '<azure oauth token>'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 'azure',
user_name = '<ad client id>',
password = '<ad client secret key>',
options = {
'azure_storage_account_name': '<azure storage account name>',
'azure_container_name': '<azure container name>',
'azure_tenant_id': '<azure tenant id>'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 'hdfs://<host>:<port>',
user_name = '<hdfs username>',
password = '<hdfs password>',
options = {}
)
h_db.create_datasource(
name = '<data source name>',
location = 'hdfs://<host>:<port>',
user_name = '<hdfs username>',
password = '',
options = {
'hdfs_use_kerberos': 'true'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 'hdfs://<host>:<port>',
user_name = '<hdfs username>',
password = '',
options = {
'hdfs_kerberos_keytab': '<keytab file/path>'
}
)
h_db.create_datasource(
name = '<data source name>',
location = 's3',
user_name = '<aws access id>',
password = '<aws access key>',
options = {
's3_bucket_name': '<aws s3 bucket name>',
's3_region': '<aws s3 region>'
}
)