An external table is a database object whose source data is located in one or more files external to the database. The source data can be located either on the cluster, accessible to the database; or remotely, accessible via a data source. External tables are created via the /create/table/external native API call. For details on interfacing with external tables from SQL, see CREATE EXTERNAL TABLE. Using external tables in an environment with ring resiliency enabled has additional considerations.
There are two types of external table, distinguished by the scheme each uses to pull data from the external source:
There are two source data file formats supported for external tables:
Regardless of the format selected, one or more source data fields can be used in the creation of the external table. Date/time fields can have their source formats specified.
An external table can be assigned many of the features of standard tables, some of which include:
External table names and column names must adhere to the supported naming criteria, and the name resolution follows that of tables.
If an external table is to use a data source, then data source connect privilege is required for the following actions:
Materialized external tables have three ingest modes available:
Depending on the location of the source data files, one of the following schemes for distributing the extraction and loading of data can be assigned:
Materialized external tables can be directed to refresh their data when the database starts up. Depending on the amount of data and the transfer, parse, & load time, it may be beneficial to load all data at startup, or delay the refresh until a later time. If the data is not refreshed, it will be the same as it was before startup.
An error mode can be assigned to an external table, instructing it on how to handle source data field errors in parsing & loading:
To create an external table with the following features, using a local file as the source of data:
ext_employee
in the example
schemadata/employee.parquet
, relative to the configured
external files directoryid
columnh_db.create_table_external(
table_name = 'example.ext_employee',
filepaths = 'data/employee.parquet',
options = {
'file_type': 'parquet',
'primary_keys': 'id'
}
)
To create an external table with the following features, using a data source as the source of data:
ext_product
in the example
schemaproduct_ds
products.csv
h_db.create_table_external(
table_name = 'example.ext_product',
filepaths = 'products.csv',
options = {
'datasource_name': 'product_ds',
'refresh_method': 'on_start'
}
)