Class GPUdbTable

class gpudb.GPUdbTable(_type=None, name=None, options=None, db=None, read_only_table_count=None, delete_temporary_views=True, temporary_view_names=None, create_views=True, use_multihead_io=False, use_multihead_ingest=False, multihead_ingest_batch_size=10000, flush_multi_head_ingest_per_insertion=False)[source]

Bases: object

Parameters

_type (RecordType or GPUdbRecordType or list of lists of str) –

Either a GPUdbRecordType or RecordType object which represents a type for the table, or a nested list of lists, where each internal list has the format of:

# Just the name and type
[ "name", "type (double, int etc.)" ]

# Name, type, and one column property
[ "name", "type (double, int etc.)", "nullable" ]

# Name, type, and multiple column properties
[ "name", "string", "char4", "nullable" ]

Pass None for collections. If creating a GPUdbTable object for a pre-existing table, then also pass None.

If no table with the given name exists, then the given type will be created in GPUdb before creating the table.

Default is None.

name (str) –
The name for the table. if none provided, then a random name will be generated using random_name().
options (GPUdbTableOptions or dict) –
A GPUdbTableOptions object or a dict containing options for the table creation.
db (GPUdb) –
A GPUdb object that allows the user to connect to the GPUdb server.
read_only_table_count (int) –
For known read-only tables, provide the number of records in it. Integer. Must provide the name of the table.
delete_temporary_views (bool) –
If true, then in terminal queries–queries that can not be chained–delete the temporary views upon completion. Defaults to True.
create_views (bool) –
Indicates whether or not to create views for this table.
temporary_view_names (list) –
Optional list of temporary view names (that ought to be deleted upon terminal queries)
use_multihead_io (bool) –

Indicates whether or not to use multi-head input and output (meaning ingestion and lookup). Default is False. Note that multi-head ingestion is more computation intensive for sharded tables, and it it probably advisable only if there is a heavy ingestion load. Choose carefully.

Please see documentation of parameters multihead_ingest_batch_size and flush_multi_head_ingest_per_insertion for controlling the multi-head ingestion related behavior.

use_multihead_ingest (bool) –
Indicates whether or not to use multi-head ingestion, if available upon insertion. Note that multi-head ingestion is more computation intensive for sharded tables, and it it probably advisable only if there is a heavy ingestion load. Default is False. Will be deprecated in version 7.0.
multihead_ingest_batch_size (int) –
Used only in conjunction with use_multihead_ingest; ignored otherwise. Sets the batch size to be used for the ingestor. Must be greater than zero. Default is 10,000. The multi-head ingestor flushes the inserted records every multihead_ingest_batch_size automatically, unless flush_multi_head_ingest_automatically is False. Any remaining records would have to be manually flushed using flush_data_to_server() by the user, or will be automatically flushed per insert_records() if flush_multi_head_ingest_automatically is True.
flush_multi_head_ingest_per_insertion (bool) –
Used only in conjunction with use_multihead_ingest; ignored otherwise. If True, flushes the multi-head ingestor in every insert_records() call. Otherwise, the multi-head ingestor flushes the data to the server when a worker queue reaches multihead_ingest_batch_size in size, and any remaining records will have to be manually flushed using flush_data_to_server(). Default False.

Returns

A GPUdbTable object.
static random_name()[source]

Returns a randomly generated uuid-based name

static prefix_name(val)[source]

Returns a random name with the specified prefix

size()[source]

Return the table’s size/length/count.

set_logger_level(log_level)[source]

Set the log level for the GPUdbTable class and any multi-head i/o related classes it uses.

Parameters

log_level (int, long, or str) –
A valid log level for the logging module
is_read_only

Is the table read-only, or can we modify it?

count

Return the table’s size/length/count.

is_collection

Returns True if the table is a collection; False otherwise.

collection_name

Returns the name of the collection this table is a member of; None if this table does not belong to any collection.

is_replicated()[source]

Returns True if the table is replicated.

get_table_type()[source]

Return the table’s (record) type (the GPUdbRecordType object, not the c-extension RecordType).

alias(alias)[source]

Create an alias string for this table.

Parameters

alias (str) –
A string that contains the alias.

Returns

A string with the format “this-table-name as alias”.
create_view(view_name, count=None)[source]

Given a view name and a related response, create a new GPUdbTable object which is a read-only table with the intermediate tables automatically updated.

Returns

A GPUdbTable object
cleanup()[source]

Clear/drop all intermediate tables if settings allow it.

Returns

self for enabling chaining method invocations.
exists(options={})[source]

Checks for the existence of a table with the given name.

Returns

A boolean flag indicating whether the table currently –

exists in the database.

flush_data_to_server()[source]

If multi-head ingestion is enabled, then flush all records in the ingestors’ worker queues so that they actually get inserted to the server database.

insert_records(*args, **kwargs)[source]

Insert one or more records.

Parameters

args –

Values for all columns of a single record or multiple records. For a single record, use either of the following syntaxes:

insert_records( 1, 2, 3 )
insert_records( [1, 2, 3] )

For multiple records, use either of the following syntaxes:

insert_records( [ [1, 2, 3], [4, 5, 6] ] )
insert_records(   [1, 2, 3], [4, 5, 6]   )

Also, the user can use keyword arguments to pass in values:

# For a record type with two integers named 'a' and 'b':
insert_records( {"a":  1, "b":  1},
                {"a": 42, "b": 32} )

# Also can use a list to pass the dicts
insert_records( [ {"a":  1, "b":  1},
                  {"a": 42, "b": 32} ] )

Additionally, the user may provide options for the insertion operation. For example:

insert_records( [1, 2, 3], [4, 5, 6],
                options = {"return_record_ids": "true"} )
kwargs –

Values for all columns for a single record. Mutually exclusive with args (i.e. cannot provide both) when it only contains data.

May contain an ‘options’ keyword arg which will be passed to the database for the insertion operation.

Returns

A GPUdbTable object with the the insert_records() –

response fields converted to attributes and stored within.

insert_records_random(count=None, options={})[source]

Generates a specified number of random records and adds them to the given table. There is an optional parameter that allows the user to customize the ranges of the column values. It also allows the user to specify linear profiles for some or all columns in which case linear values are generated rather than random ones. Only individual tables are supported for this operation.

This operation is synchronous, meaning that a response will not be returned until all random records are fully available.

Parameters

count (long) –
Number of records to generate.
options (dict of dicts of floats) –

Optional parameter to pass in specifications for the randomness of the values. This map is different from the options parameter of most other endpoints in that it is a map of string to map of string to doubles, while most others are maps of string to string. In this map, the top level keys represent which column’s parameters are being specified, while the internal keys represents which parameter is being specified. These parameters take on different meanings depending on the type of the column. Below follows a more detailed description of the map: Default value is an empty dict ( {} ). Allowed keys are:

  • seed – If provided, the internal random number generator will be initialized with the given value. The minimum is 0. This allows for the same set of random numbers to be generated across invocation of this endpoint in case the user wants to repeat the test. Since input parameter options, is a map of maps, we need an internal map to provide the seed value. For example, to pass 100 as the seed value through this parameter, you need something equivalent to: ‘options’ = {‘seed’: { ‘value’: 100 } } Allowed keys are:
    • value – Pass the seed value here.
  • all – This key indicates that the specifications relayed in the internal map are to be applied to all columns of the records. Allowed keys are:
    • min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track semantic types, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010. For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max. Value needs to be within [0, 200]. If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
    • max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track semantic types, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0. For string columns, the maximum length of the randomly generated strings is set to this value (default is 200). If both minimum and maximum are provided, max must be greater than or equal to min. Value needs to be within [0, 200]. If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
    • interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linerally generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (eg char4) the prefix is dropped. No nulls will be generated for nullable columns.
    • null_percentage – If specified, then generate the given percentage of the count as nulls for all nullable columns. This option will be ignored for non-nullable columns. The value must be within the range [0, 1.0]. The default value is 5% (0.05).
    • cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.
  • attr_name – Set the following parameters for the column specified by the key. This overrides any parameter set by all. Allowed keys are:
    • min – For numerical columns, the minimum of the generated values is set to this value. Default is -99999. For point, shape, and track semantic types, min for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are -180.0 and -90.0. For the ‘TIMESTAMP’ column, the default minimum corresponds to Jan 1, 2010. For string columns, the minimum length of the randomly generated strings is set to this value (default is 0). If both minimum and maximum are provided, minimum must be less than or equal to max. Value needs to be within [0, 200]. If the min is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
    • max – For numerical columns, the maximum of the generated values is set to this value. Default is 99999. For point, shape, and track semantic types, max for numeric ‘x’ and ‘y’ columns needs to be within [-180, 180] and [-90, 90], respectively. The default minimum possible values for these columns in such cases are 180.0 and 90.0. For string columns, the maximum length of the randomly generated strings is set to this value (default is 200). If both minimum and maximum are provided, max must be greater than or equal to min. Value needs to be within [0, 200]. If the max is outside the accepted ranges for strings columns and ‘x’ and ‘y’ columns for point/shape/track types, then those parameters will not be set; however, an error will not be thrown in such a case. It is the responsibility of the user to use the all parameter judiciously.
    • interval – If specified, generate values for all columns evenly spaced with the given interval value. If a max value is specified for a given column the data is randomly generated between min and max and decimated down to the interval. If no max is provided the data is linerally generated starting at the minimum value (instead of generating random data). For non-decimated string-type columns the interval value is ignored. Instead the values are generated following the pattern: ‘attrname_creationIndex#’, i.e. the column name suffixed with an underscore and a running counter (starting at 0). For string types with limited size (eg char4) the prefix is dropped. No nulls will be generated for nullable columns.
    • null_percentage – If specified and if this column is nullable, then generate the given percentage of the count as nulls. This option will result in an error if the column is not nullable. The value must be within the range [0, 1.0]. The default value is 5% (0.05).
    • cardinality – If specified, limit the randomly generated values to a fixed set. Not allowed on a column with interval specified, and is not applicable to WKT or Track-specific columns. The value must be greater than 0. This option is disabled by default.
  • track_length – This key-map pair is only valid for track type data sets (an error is thrown otherwise). No nulls would be generated for nullable columns. Allowed keys are:
    • min – Minimum possible length for generated series; default is 100 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, min must be less than or equal to max.
    • max – Maximum possible length for generated series; default is 500 records per series. Must be an integral value within the range [1, 500]. If both min and max are specified, max must be greater than or equal to min.

Returns

A GPUdbTable object with the the insert_records() response fields –

converted to attributes (and stored within) –

entries

table_name (str) –
Value of input parameter table_name.
count (long) –
Value of input parameter count.
get_records_by_key(key_values, expression='', options=None)[source]

Fetches the record(s) from the appropriate worker rank directly (or, if multi-head record retrieval is not set up, then from the head node) that map to the given shard key.

Parameters

key_values (list or dict) –
Values for the sharding columns of the record to fetch either in a list (then it is assumed to be in the order of the sharding keys in the record type) or a dict. Must not have any missing sharding/primary column value or any extra column values.
expression (str) –
Optional parameter. If given, it is passed to /get/records as a filter expression.
options (dict of str to str or None) –
Any /get/records options to be passed onto the GPUdb server. Optional parameter.

Returns

The decoded records.
get_records(offset=0, limit=10000, encoding='binary', options={}, force_primitive_return_types=True)[source]

Retrieves records from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables, views, or on homogeneous collections (collections containing tables of all the same type). Records can be returned encoded as binary or json.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Decodes and returns the fetched records.

Parameters

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 10000.
encoding (str) –

Specifies the encoding for returned records. Default value is ‘binary’. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str) –

Default value is an empty dict ( {} ). Allowed keys are:

  • expression – Optional filter expression to apply to the table.

  • fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. The default value is true.

  • sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).

  • sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

    • ascending
    • descending

    The default value is ‘ascending’.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A list of Record objects containg the record values.
get_records_by_column(column_names, offset=0, limit=10000, encoding='binary', options={}, print_data=False, force_primitive_return_types=True, get_column_major=True)[source]

For a given table, retrieves the values of the given columns within a given range. It returns maps of column name to the vector of values for each supported data type (double, float, long, int and string). This operation supports pagination feature, i.e. values that are retrieved are those associated with the indices between the start (offset) and end value (offset + limit) parameters (inclusive). If there are num_points values in the table then each of the indices between 0 and num_points-1 retrieves a unique value.

Note that when using the pagination feature, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records or values retrieved may differ between calls (discontiguous or overlap) based on the type of the update.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

column_names (list of str) –
The list of column values to retrieve.
offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned (if not provided the default is 10000), or END_OF_SET (-9999) to indicate that the maximum number of results allowed by the server should be returned.
encoding (str) –

Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str) –

Default value is an empty dict ( {} ). Allowed keys are:

  • expression – Optional filter expression to apply to the table.

  • sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).

  • sort_order – String indicating how the returned values should be sorted - ascending or descending. Default is ‘ascending’. If sort_order is provided, sort_by has to be provided. Allowed values are:

    • ascending
    • descending

    The default value is ‘ascending’.

  • order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name.

print_data (bool) –
If True, print the fetched data to the console in a tabular format if the data is being returned in the column-major format. Default is False.
force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.
get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Decodes the fetched records and saves them in the response class in an attribute called data.

Returns

A dict of column name to column values for column-major data, or –

a list of Record objects for row-major data.

get_records_by_series(world_table_name=None, offset=0, limit=250, encoding='binary', options={}, force_primitive_return_types=True)[source]

Retrieves the complete series/track records from the given input parameter world_table_name based on the partial track information contained in the input parameter table_name.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

In contrast to get_records() this returns records grouped by series/track. So if input parameter offset is 0 and input parameter limit is 5 this operation would return the first 5 series/tracks in input parameter table_name. Each series/track will be returned sorted by their TIMESTAMP column.

Parameters

world_table_name (str) –
Name of the table containing the complete series/track information to be returned for the tracks present in the input parameter table_name. Typically this is used when retrieving series/tracks from a view (which contains partial series/tracks) but the user wants to retrieve the entire original series/tracks. Can be blank.
offset (int) –
A positive integer indicating the number of initial series/tracks to skip (useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (int) –
A positive integer indicating the maximum number of series/tracks to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 250.
encoding (str) –

Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str) –
Optional parameters. Default value is an empty dict ( {} ).
force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A list of list of Record objects containing the record values. –

Each external record corresponds to a single track (or series)

get_records_from_collection(offset=0, limit=10000, encoding='binary', options={}, force_primitive_return_types=True)[source]

Retrieves records from a collection. The operation can optionally return the record IDs which can be used in certain queries such as delete_records().

This operation supports paging through the data via the input parameter offset and input parameter limit parameters.

Note that when using the Java API, it is not possible to retrieve records from join tables using this operation.

Parameters

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned, or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 10000.
encoding (str) –

Specifies the encoding for returned records; either ‘binary’ or ‘json’. Default value is ‘binary’. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str) –

Default value is an empty dict ( {} ). Allowed keys are:

  • return_record_ids – If ‘true’ then return the internal record ID along with each returned record. Default is ‘false’. Allowed values are:

    • true
    • false

    The default value is ‘false’.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A list of Record objects containing the record values.
get_geo_json(offset=0, limit=10000, options={}, force_primitive_return_types=True)[source]

Retrieves records as a GeoJSON from a given table, optionally filtered by an expression and/or sorted by a column. This operation can be performed on tables, views, or on homogeneous collections (collections containing tables of all the same type). Records can be returned encoded as binary or json.

This operation supports paging through the data via the input parameter offset and input parameter limit parameters. Note that when paging through a table, if the table (or the underlying table in case of a view) is updated (records are inserted, deleted or modified) the records retrieved may differ between calls based on the updates applied.

Decodes and returns the fetched records.

Parameters

offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). Default value is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. Default value is 10000.
encoding (str) –

Specifies the encoding for returned records. Default value is ‘binary’. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str) –

Default value is an empty dict ( {} ). Allowed keys are:

  • expression – Optional filter expression to apply to the table.

  • fast_index_lookup – Indicates if indexes should be used to perform the lookup for a given expression if possible. Only applicable if there is no sorting, the expression contains only equivalence comparisons based on existing tables indexes and the range of requested values is from [0 to END_OF_SET]. The default value is true.

  • sort_by – Optional column that the data should be sorted by. Empty by default (i.e. no sorting is applied).

  • sort_order – String indicating how the returned values should be sorted - ascending or descending. If sort_order is provided, sort_by has to be provided. Allowed values are:

    • ascending
    • descending

    The default value is ‘ascending’.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.

Returns

A GeoJSON object (a dict)
static create_join_table(db, join_table_name=None, table_names=None, column_names=None, expressions=[], options={})[source]

Creates a table that is the result of a SQL JOIN.

For join details and examples see: Joins. For limitations, see Join Limitations and Cautions.

Parameters

join_table_name (str) –
Name of the join table to be created. Has the same naming restrictions as tables.
table_names (list of str) –
The list of table names composing the join. Corresponds to a SQL statement FROM clause. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
column_names (list of str) –
List of member table columns or column expressions to be included in the join. Columns can be prefixed with ‘table_id.column_name’, where ‘table_id’ is the table name or alias. Columns can be aliased via the syntax ‘column_name as alias’. Wild cards ‘*’ can be used to include all columns across member tables or ‘table_id.*’ for all of a single table’s columns. Columns and column expressions composing the join must be uniquely named or aliased–therefore, the ‘*’ wild card cannot be used if column names aren’t unique across all tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
expressions (list of str) –
An optional list of expressions to combine and filter the joined tables. Corresponds to a SQL statement WHERE clause. For details see: expressions. The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the join. If the collection provided is non-existent, the collection will be automatically created. If empty, then the join will be at the top level. The default value is ‘’.

  • max_query_dimensions – Obsolete in GPUdb v7.0

  • optimize_lookups – Use more memory to speed up the joining of tables. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • ttl – Sets the TTL of the join table specified in input parameter join_table_name.

  • view_id – view this projection is part of. The default value is ‘’.

  • no_count – return a count of 0 for the join table for logging and for show_table. optimization needed for large overlapped equi-join stencils. The default value is ‘false’.

  • chunk_size – Maximum size of a joined-chunk for this table. Defaults to the gpudb.conf file chunk size

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
static create_union(db, table_name=None, table_names=None, input_column_names=None, output_column_names=None, options={})[source]

Merges data from one or more tables with comparable data types into a new table.

The following merges are supported:

UNION (DISTINCT/ALL) - For data set union details and examples, see Union. For limitations, see Union Limitations and Cautions.

INTERSECT (DISTINCT/ALL) - For data set intersection details and examples, see Intersect. For limitations, see Intersect Limitations.

EXCEPT (DISTINCT/ALL) - For data set subtraction details and examples, see Except. For limitations, see Except Limitations.

MERGE VIEWS - For a given set of filtered views on a single table, creates a single filtered view containing all of the unique records across all of the given filtered data sets.

Non-charN ‘string’ and ‘bytes’ column types cannot be merged, nor can columns marked as store-only.

Parameters

table_name (str) –
Name of the table to be created. Has the same naming restrictions as tables.
table_names (list of str) –
The list of table names to merge. Must contain the names of one or more existing tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
input_column_names (list of lists of str) –
The list of columns from each of the corresponding input tables. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
output_column_names (list of str) –
The list of names of the columns to be stored in the output table. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the output table. If the collection provided is non-existent, the collection will be automatically created. If empty, the output table will be a top-level table. The default value is ‘’.

  • materialize_on_gpu – If true, then the columns of the output table will be cached on the GPU. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • mode – If merge_views, then this operation will merge the provided views. All input parameter table_names must be views from the same underlying base table. Allowed values are:

    • union_all – Retains all rows from the specified tables.
    • union – Retains all unique rows from the specified tables (synonym for union_distinct).
    • union_distinct – Retains all unique rows from the specified tables.
    • except – Retains all unique rows from the first table that do not appear in the second table (only works on 2 tables).
    • except_all – Retains all rows(including duplicates) from the first table that do not appear in the second table (only works on 2 tables).
    • intersect – Retains all unique rows that appear in both of the specified tables (only works on 2 tables).
    • intersect_all – Retains all rows(including duplicates) that appear in both of the specified tables (only works on 2 tables).
    • merge_views – Merge two or more views (or views of views) of the same base data set into a new view. If this mode is selected input parameter input_column_names AND input parameter output_column_names must be empty. The resulting view would match the results of a SQL OR operation, e.g., if filter 1 creates a view using the expression ‘x = 20’ and filter 2 creates a view using the expression ‘x <= 10’, then the merge views operation creates a new view using the expression ‘x = 20 OR x <= 10’.

    The default value is ‘union_all’.

  • chunk_size – Indicates the chunk size to be used for this table.

  • create_indexes – Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in input parameter output_column_names.

  • ttl – Sets the TTL of the table specified in input parameter table_name.

  • persist – If true, then the table specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • view_id – view the output table will be a part of. The default value is ‘’.

  • force_replicated – If true, then the table specified in input parameter table_name will be replicated even if the source tables are not. Allowed values are:

    • true
    • false

    The default value is ‘false’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
static merge_records(db, table_name=None, source_table_names=None, field_maps=None, options={})[source]

Create a new empty result table (specified by input parameter table_name), and insert all records from source tables (specified by input parameter source_table_names) based on the field mapping information (specified by input parameter field_maps).

For merge records details and examples, see Merge Records. For limitations, see Merge Records Limitations and Cautions.

The field map (specified by input parameter field_maps) holds the user-specified maps of target table column names to source table columns. The array of input parameter field_maps must match one-to-one with the input parameter source_table_names, e.g., there’s a map present in input parameter field_maps for each table listed in input parameter source_table_names.

Parameters

table_name (str) –
The new result table name for the records to be merged. Must NOT be an existing table.
source_table_names (list of str) –
The list of source table names to get the records from. Must be existing table names. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
field_maps (list of dicts of str to str) –
Contains a list of source/target column mappings, one mapping for each source table listed in input parameter source_table_names being merged into the target table specified by input parameter table_name. Each mapping contains the target column names (as keys) that the data in the mapped source columns or column expressions (as values) will be merged into. All of the source columns being merged into a given target column must match in type, as that type will determine the type of the new target column. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created merged table specified by input parameter table_name. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created merged table will be a top-level table.

  • is_replicated – Indicates the distribution scheme for the data of the merged table specified in input parameter table_name. If true, the table will be replicated. If false, the table will be randomly sharded. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • ttl – Sets the TTL of the merged table specified in input parameter table_name.

  • persist – If true, then the table specified in input parameter table_name will be persisted and will not expire unless a ttl is specified. If false, then the table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘true’.

  • chunk_size – Indicates the chunk size to be used for the merged table specified in input parameter table_name.

  • view_id – view this result table is part of. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_convex_hull(x_column_name=None, y_column_name=None, options={})[source]

Calculates and returns the convex hull for the values in a table specified by input parameter table_name.

Parameters

x_column_name (str) –
Name of the column containing the x coordinates of the points for the operation being performed.
y_column_name (str) –
Name of the column containing the y coordinates of the points for the operation being performed.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

x_vector (list of floats) –
Array of x coordinates of the resulting convex set.
y_vector (list of floats) –
Array of y coordinates of the resulting convex set.
count (int) –
Count of the number of points in the convex set.

is_valid (bool)

info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_group_by(column_names=None, offset=None, limit=1000, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]

Calculates unique combinations (groups) of values for the given columns in a given table/view/collection and computes aggregates on each unique combination. This is somewhat analogous to an SQL-style SELECT...GROUP BY.

For aggregation details and examples, see Aggregation. For limitations, see Aggregation Limitations.

Any column(s) can be grouped on, and all column types except unrestricted-length strings may be used for computing applicable aggregates; columns marked as store-only are unable to be used in grouping or aggregation.

The results can be paged via the input parameter offset and input parameter limit parameters. For example, to get 10 groups with the largest counts the inputs would be: limit=10, options={“sort_order”:”descending”, “sort_by”:”value”}.

Input parameter options can be used to customize behavior of this call e.g. filtering or sorting the results.

To group by columns ‘x’ and ‘y’ and compute the number of objects within each group, use: column_names=[‘x’,’y’,’count(*)’].

To also compute the sum of ‘z’ over each group, use: column_names=[‘x’,’y’,’count(*)’,’sum(z)’].

Available aggregation functions are: count(*), sum, min, max, avg, mean, stddev, stddev_pop, stddev_samp, var, var_pop, var_samp, arg_min, arg_max and count_distinct.

Available grouping functions are Rollup, Cube, and Grouping Sets

This service also provides support for Pivot operations.

Filtering on aggregates is supported via expressions using aggregation functions supplied to having.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column names must adhere to standard naming conventions; column/aggregation expressions will need to be aliased. If the source table’s shard key is used as the grouping column(s) and all result records are selected (input parameter offset is 0 and input parameter limit is -9999), the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available when any of the values of input parameter column_names is an unrestricted-length string.

Parameters

column_names (list of str) –
List of one or more column names, expressions, and aggregate expressions.
offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is 1000.
encoding (str) –

Specifies the encoding for returned records. Allowed values are:

  • binary – Indicates that the returned records should be binary encoded.
  • json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the table specified in result_table. If the collection provided is non-existent, the collection will be automatically created. If empty, then the table will be a top-level table. Additionally this option is invalid if input parameter table_name is a collection.

  • expression – Filter expression to apply to the table prior to computing the aggregate group by.

  • having – Filter expression to apply to the aggregated results.

  • sort_order – String indicating how the returned values should be sorted - ascending or descending. Allowed values are:

    • ascending – Indicates that the returned values should be sorted in ascending order.
    • descending – Indicates that the returned values should be sorted in descending order.

    The default value is ‘ascending’.

  • sort_by – String determining how the results are sorted. Allowed values are:

    • key – Indicates that the returned values should be sorted by key, which corresponds to the grouping columns. If you have multiple grouping columns (and are sorting by key), it will first sort the first grouping column, then the second grouping column, etc.
    • value – Indicates that the returned values should be sorted by value, which corresponds to the aggregates. If you have multiple aggregates (and are sorting by value), it will first sort by the first aggregate, then the second aggregate, etc.

    The default value is ‘value’.

  • result_table – The name of the table used to store the results. Has the same naming restrictions as tables. Column names (group-by and aggregate fields) need to be given aliases e.g. [“FChar256 as fchar256”, “sum(FDouble) as sfd”]. If present, no results are returned in the response. This option is not available if one of the grouping attributes is an unrestricted string (i.e.; not charN) type.

  • result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • result_table_generate_pk – If ‘true’ then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • ttl – Sets the TTL of the table specified in result_table.

  • chunk_size – Indicates the chunk size to be used for the result table. Must be used in combination with the result_table option.

  • create_indexes – Comma-separated list of columns on which to create indexes on the result table. Must be used in combination with the result_table option.

  • view_id – view this result table is part of. The default value is ‘’.

  • materialize_on_gpu – If true then the columns of the groupby result table will be cached on the GPU. Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • pivot – pivot column

  • pivot_values – The value list provided will become the column headers in the output. Should be the values from the pivot_column.

  • grouping_sets – Customize the grouping attribute sets to compute the aggregates. These sets can include ROLLUP or CUBE operartors. The attribute sets should be enclosed in paranthesis and can include composite attributes. All attributes specified in the grouping sets must present in the groupby attributes.

  • rollup – This option is used to specify the multilevel aggregates.

  • cube – This option is used to specify the multidimensional aggregates.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.
get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A read-only GPUdbTable object if input options has “result_table”; –

otherwise the response from the server, which is a dict containing –

the following entries–

response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
total_number_of_records (long) –
Total/Filtered number of records.
has_more_records (bool) –
Too many records. Returned a partial set.
info (dict of str to str) –
Additional information.
records (list of Record) –
A list of Record objects which contain the decoded records.
data (list of Record) –
A list of Record objects which contain the decoded records.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_histogram(column_name=None, start=None, end=None, interval=None, options={})[source]

Performs a histogram calculation given a table, a column, and an interval function. The input parameter interval is used to produce bins of that size and the result, computed over the records falling within each bin, is returned. For each bin, the start value is inclusive, but the end value is exclusive–except for the very last bin for which the end value is also inclusive. The value returned for each bin is the number of records in it, except when a column name is provided as a value_column. In this latter case the sum of the values corresponding to the value_column is used as the result instead. The total number of bins requested cannot exceed 10,000.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service a request that specifies a value_column option.

Parameters

column_name (str) –
Name of a column or an expression of one or more column names over which the histogram will be calculated.
start (float) –
Lower end value of the histogram interval, inclusive.
end (float) –
Upper end value of the histogram interval, inclusive.
interval (float) –
The size of each bin within the start and end parameters.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • value_column – The name of the column to use when calculating the bin values (values are summed). The column must be a numerical type (int, double, long, float).

Returns

The response from the server which is a dict containing the –

following entries–

counts (list of floats) –
The array of calculated values that represents the histogram data points.
start (float) –
Value of input parameter start.
end (float) –
Value of input parameter end.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_k_means(column_names=None, k=None, tolerance=None, options={})[source]

This endpoint runs the k-means algorithm - a heuristic algorithm that attempts to do k-means clustering. An ideal k-means clustering algorithm selects k points such that the sum of the mean squared distances of each member of the set to the nearest of the k points is minimized. The k-means algorithm however does not necessarily produce such an ideal cluster. It begins with a randomly selected set of k points and then refines the location of the points iteratively and settles to a local minimum. Various parameters and options are provided to control the heuristic search.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

column_names (list of str) –
List of column names on which the operation would be performed. If n columns are provided then each of the k result points will have n dimensions corresponding to the n columns. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
k (int) –
The number of mean points to be determined by the algorithm.
tolerance (float) –
Stop iterating when the distances between successive points is less than the given tolerance.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • whiten – When set to 1 each of the columns is first normalized by its stdv - default is not to whiten.
  • max_iters – Number of times to try to hit the tolerance limit before giving up - default is 10.
  • num_tries – Number of times to run the k-means algorithm with a different randomly selected starting points - helps avoid local minimum. Default is 1.

Returns

The response from the server which is a dict containing the –

following entries–

means (list of lists of floats) –
The k-mean values found.
counts (list of longs) –
The number of elements in the cluster closest the corresponding k-means values.
rms_dists (list of floats) –
The root mean squared distance of the elements in the cluster for each of the k-means values.
count (long) –
The total count of all the clusters - will be the size of the input table.
rms_dist (float) –
The sum of all the rms_dists - the value the k-means algorithm is attempting to minimize.
tolerance (float) –
The distance between the last two iterations of the algorithm before it quit.
num_iters (int) –
The number of iterations the algorithm executed before it quit.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_min_max(column_name=None, options={})[source]

Calculates and returns the minimum and maximum values of a particular column in a table.

Parameters

column_name (str) –
Name of a column or an expression of one or more column on which the min-max will be calculated.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

min (float) –
Minimum value of the input parameter column_name.
max (float) –
Maximum value of the input parameter column_name.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_min_max_geometry(column_name=None, options={})[source]

Calculates and returns the minimum and maximum x- and y-coordinates of a particular geospatial geometry column in a table.

Parameters

column_name (str) –
Name of a geospatial geometry column on which the min-max will be calculated.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

min_x (float) –
Minimum x-coordinate value of the input parameter column_name.
max_x (float) –
Maximum x-coordinate value of the input parameter column_name.
min_y (float) –
Minimum y-coordinate value of the input parameter column_name.
max_y (float) –
Maximum y-coordinate value of the input parameter column_name.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_statistics(column_name=None, stats=None, options={})[source]

Calculates the requested statistics of the given column(s) in a given table.

The available statistics are count (number of total objects), mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, weighted_average, cardinality (unique count), estimated_cardinality, percentile and percentile_rank.

Estimated cardinality is calculated by using the hyperloglog approximation technique.

Percentiles and percentile ranks are approximate and are calculated using the t-digest algorithm. They must include the desired percentile/percentile_rank. To compute multiple percentiles each value must be specified separately (i.e. ‘percentile(75.0),percentile(99.0),percentile_rank(1234.56),percentile_rank(-5)’).

A second, comma-separated value can be added to the percentile statistic to calculate percentile resolution, e.g., a 50th percentile with 200 resolution would be ‘percentile(50,200)’.

The weighted average statistic requires a weight_column_name to be specified in input parameter options. The weighted average is then defined as the sum of the products of input parameter column_name times the weight_column_name values divided by the sum of the weight_column_name values.

Additional columns can be used in the calculation of statistics via the additional_column_names option. Values in these columns will be included in the overall aggregate calculation–individual aggregates will not be calculated per additional column. For instance, requesting the count & mean of input parameter column_name x and additional_column_names y & z, where x holds the numbers 1-10, y holds 11-20, and z holds 21-30, would return the total number of x, y, & z values (30), and the single average value across all x, y, & z values (15.5).

The response includes a list of key/value pairs of each statistic requested and its corresponding value.

Parameters

column_name (str) –
Name of the primary column for which the statistics are to be calculated.
stats (str) –

Comma separated list of the statistics to calculate, e.g. “sum,mean”. Allowed values are:

  • count – Number of objects (independent of the given column(s)).
  • mean – Arithmetic mean (average), equivalent to sum/count.
  • stdv – Sample standard deviation (denominator is count-1).
  • variance – Unbiased sample variance (denominator is count-1).
  • skew – Skewness (third standardized moment).
  • kurtosis – Kurtosis (fourth standardized moment).
  • sum – Sum of all values in the column(s).
  • min – Minimum value of the column(s).
  • max – Maximum value of the column(s).
  • weighted_average – Weighted arithmetic mean (using the option weight_column_name as the weighting column).
  • cardinality – Number of unique values in the column(s).
  • estimated_cardinality – Estimate (via hyperloglog technique) of the number of unique values in the column(s).
  • percentile – Estimate (via t-digest) of the given percentile of the column(s) (percentile(50.0) will be an approximation of the median). Add a second, comma-separated value to calculate percentile resolution, e.g., ‘percentile(75,150)’
  • percentile_rank – Estimate (via t-digest) of the percentile rank of the given value in the column(s) (if the given value is the median of the column(s), percentile_rank(<median>) will return approximately 50.0).
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • additional_column_names – A list of comma separated column names over which statistics can be accumulated along with the primary column. All columns listed and input parameter column_name must be of the same type. Must not include the column specified in input parameter column_name and no column can be listed twice.
  • weight_column_name – Name of column used as weighting attribute for the weighted average statistic.

Returns

The response from the server which is a dict containing the –

following entries–

stats (dict of str to floats) –
(statistic name, double value) pairs of the requested statistics, including the total count by default.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_statistics_by_range(select_expression='', column_name=None, value_column_name=None, stats=None, start=None, end=None, interval=None, options={})[source]

Divides the given set into bins and calculates statistics of the values of a value-column in each bin. The bins are based on the values of a given binning-column. The statistics that may be requested are mean, stdv (standard deviation), variance, skew, kurtosis, sum, min, max, first, last and weighted average. In addition to the requested statistics the count of total samples in each bin is returned. This counts vector is just the histogram of the column used to divide the set members into bins. The weighted average statistic requires a weight_column to be specified in input parameter options. The weighted average is then defined as the sum of the products of the value column times the weight column divided by the sum of the weight column.

There are two methods for binning the set members. In the first, which can be used for numeric valued binning-columns, a min, max and interval are specified. The number of bins, nbins, is the integer upper bound of (max-min)/interval. Values that fall in the range [min+n*interval,min+(n+1)*interval) are placed in the nth bin where n ranges from 0..nbin-2. The final bin is [min+(nbin-1)*interval,max]. In the second method, input parameter options bin_values specifies a list of binning column values. Binning-columns whose value matches the nth member of the bin_values list are placed in the nth bin. When a list is provided the binning-column must be of type string or int.

NOTE: The Kinetica instance being accessed must be running a CUDA (GPU-based) build to service this request.

Parameters

select_expression (str) –
For a non-empty expression statistics are calculated for those records for which the expression is true. The default value is ‘’.
column_name (str) –
Name of the binning-column used to divide the set samples into bins.
value_column_name (str) –
Name of the value-column for which statistics are to be computed.
stats (str) –
A string of comma separated list of the statistics to calculate, e.g. ‘sum,mean’. Available statistics: mean, stdv (standard deviation), variance, skew, kurtosis, sum.
start (float) –
The lower bound of the binning-column.
end (float) –
The upper bound of the binning-column.
interval (float) –
The interval of a bin. Set members fall into bin i if the binning-column falls in the range [start+interval*i, start+interval*(i+1)).
options (dict of str to str) –

Map of optional parameters:. The default value is an empty dict ( {} ). Allowed keys are:

  • additional_column_names – A list of comma separated value-column names over which statistics can be accumulated along with the primary value_column.
  • bin_values – A list of comma separated binning-column values. Values that match the nth bin_values value are placed in the nth bin.
  • weight_column_name – Name of the column used as weighting column for the weighted_average statistic.
  • order_column_name – Name of the column used for candlestick charting techniques.

Returns

The response from the server which is a dict containing the –

following entries–

stats (dict of str to lists of floats) –
A map with a key for each statistic in the stats input parameter having a value that is a vector of the corresponding value-column bin statistics. In a addition the key count has a value that is a histogram of the binning-column.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_unique(column_name=None, offset=None, limit=10000, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]

Returns all the unique values from a particular column (specified by input parameter column_name) of a particular table or collection (specified by input parameter table_name). If input parameter column_name is a numeric column the values will be in output parameter binary_encoded_response. Otherwise if input parameter column_name is a string column the values will be in output parameter json_encoded_response. The results can be paged via the input parameter offset and input parameter limit parameters.

Columns marked as store-only are unable to be used with this function.

To get the first 10 unique values sorted in descending order input parameter options would be:

{"limit":"10","sort_order":"descending"}.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

If a result_table name is specified in the input parameter options, the results are stored in a new table with that name–no results are returned in the response. Both the table name and resulting column name must adhere to standard naming conventions; any column expression will need to be aliased. If the source table’s shard key is used as the input parameter column_name, the result table will be sharded, in all other cases it will be replicated. Sorting will properly function only if the result table is replicated or if there is only one processing node and should not be relied upon in other cases. Not available if input parameter table_name is a collection or when the value of input parameter column_name is an unrestricted-length string.

Parameters

column_name (str) –
Name of the column or an expression containing one or more column names on which the unique function would be applied.
offset (long) –
A positive integer indicating the number of initial results to skip (this can be useful for paging through the results). The minimum allowed value is 0. The maximum allowed value is MAX_INT.
limit (long) –
A positive integer indicating the maximum number of results to be returned. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is 10000.
encoding (str) –

Specifies the encoding for returned records. Allowed values are:

  • binary – Indicates that the returned records should be binary encoded.
  • json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the table specified in result_table. If the collection provided is non-existent, the collection will be automatically created. If empty, then the table will be a top-level table. Additionally this option is invalid if input parameter table_name is a collection.

  • expression – Optional filter expression to apply to the table.

  • sort_order – String indicating how the returned values should be sorted. Allowed values are:

    • ascending
    • descending

    The default value is ‘ascending’.

  • result_table – The name of the table used to store the results. If present, no results are returned in the response. Has the same naming restrictions as tables. Not available if input parameter table_name is a collection or when input parameter column_name is an unrestricted-length string.

  • result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • result_table_generate_pk – If ‘true’ then set a primary key for the result table. Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • ttl – Sets the TTL of the table specified in result_table.

  • chunk_size – Indicates the chunk size to be used for the result table. Must be used in combination with the result_table option.

  • view_id – view this result table is part of. The default value is ‘’.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.
get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A read-only GPUdbTable object if input options has “result_table”; –

otherwise the response from the server, which is a dict containing –

the following entries–

table_name (str) –
The same table name as was passed in the parameter list.
response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
has_more_records (bool) –
Too many records. Returned a partial set.
info (dict of str to str) –
Additional information.
records (list of Record) –
A list of Record objects which contain the decoded records.
data (list of Record) –
A list of Record objects which contain the decoded records.

Raises

GPUdbException – –
Upon an error from the server.
aggregate_unpivot(column_names=None, variable_column_name='', value_column_name='', pivoted_columns=None, encoding='binary', options={}, force_primitive_return_types=True, get_column_major=True)[source]

Rotate the column values into rows values.

For unpivot details and examples, see Unpivot. For limitations, see Unpivot Limitations.

Unpivot is used to normalize tables that are built for cross tabular reporting purposes. The unpivot operator rotates the column values for all the pivoted columns. A variable column, value column and all columns from the source table except the unpivot columns are projected into the result table. The variable column and value columns in the result table indicate the pivoted column name and values respectively.

The response is returned as a dynamic schema. For details see: dynamic schemas documentation.

Parameters

column_names (list of str) –
List of column names or expressions. A wildcard ‘*’ can be used to include all the non-pivoted columns from the source table.
variable_column_name (str) –
Specifies the variable/parameter column name. The default value is ‘’.
value_column_name (str) –
Specifies the value column name. The default value is ‘’.
pivoted_columns (list of str) –
List of one or more values typically the column names of the input table. All the columns in the source table must have the same data type.
encoding (str) –

Specifies the encoding for returned records. Allowed values are:

  • binary – Indicates that the returned records should be binary encoded.
  • json – Indicates that the returned records should be json encoded.

The default value is ‘binary’.

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the table specified in result_table. If the collection provided is non-existent, the collection will be automatically created. If empty, then the table will be a top-level table.

  • result_table – The name of the table used to store the results. Has the same naming restrictions as tables. If present, no results are returned in the response.

  • result_table_persist – If true, then the result table specified in result_table will be persisted and will not expire unless a ttl is specified. If false, then the result table will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • expression – Filter expression to apply to the table prior to unpivot processing.

  • order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input table. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

  • chunk_size – Indicates the chunk size to be used for the result table. Must be used in combination with the result_table option.

  • limit – The number of records to keep. The default value is ‘’.

  • ttl – Sets the TTL of the table specified in result_table.

  • view_id – view this result table is part of. The default value is ‘’.

  • materialize_on_gpu – If true then the output columns will be cached on the GPU. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • create_indexes – Comma-separated list of columns on which to create indexes on the table specified in result_table. The columns specified must be present in output column names. If any alias is given for any column name, the alias must be used, rather than the original column name.

  • result_table_force_replicated – Force the result table to be replicated (ignores any sharding). Must be used in combination with the result_table option. Allowed values are:

    • true
    • false

    The default value is ‘false’.

force_primitive_return_types (bool) –
If True, then OrderedDict objects will be returned, where string sub-type columns will have their values converted back to strings; for example, the Python datetime structs, used for datetime type columns would have their values returned as strings. If False, then Record objects will be returned, which for string sub-types, will return native or custom structs; no conversion to string takes place. String conversions, when returning OrderedDicts, incur a speed penalty, and it is strongly recommended to use the Record object option instead. If True, but none of the returned columns require a conversion, then the original Record objects will be returned. Default value is True.
get_column_major (bool) –
Indicates if the decoded records will be transposed to be column-major or returned as is (row-major). Default value is True.

Returns

A read-only GPUdbTable object if input options has “result_table”; –

otherwise the response from the server, which is a dict containing –

the following entries–

table_name (str) –
Typically shows the result-table name if provided in the request (Ignore otherwise).
response_schema_str (str) –
Avro schema of output parameter binary_encoded_response or output parameter json_encoded_response.
total_number_of_records (long) –
Total/Filtered number of records.
has_more_records (bool) –
Too many records. Returned a partial set.
info (dict of str to str) –
Additional information.
records (list of Record) –
A list of Record objects which contain the decoded records.
data (list of Record) –
A list of Record objects which contain the decoded records.

Raises

GPUdbException – –
Upon an error from the server.
alter_table(action=None, value=None, options={})[source]

Apply various modifications to a table, view, or collection. The available modifications include the following:

Manage a table’s columns–a column can be added, removed, or have its type and properties modified, including whether it is compressed or not.

Create or delete an index on a particular column. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Create or delete a foreign key on a particular column.

Manage a range-partitioned or a manual list-partitioned table’s partitions.

Set (or reset) the tier strategy of a table or view.

Refresh and manage the refresh mode of a materialized view.

Set the time-to-live (TTL). This can be applied to tables, views, or collections. When applied to collections, every contained table & view that is not protected will have its TTL set to the given value.

Set the global access mode (i.e. locking) for a table. This setting trumps any role-based access controls that may be in place; e.g., a user with write access to a table marked read-only will not be able to insert records into it. The mode can be set to read-only, write-only, read/write, and no access.

Change the protection mode to prevent or allow automatic expiration. This can be applied to tables, views, and collections.

Parameters

action (str) –

Modification operation to be applied Allowed values are:

  • allow_homogeneous_tables – No longer supported; action will be ignored.
  • create_index – Creates an index on the column name specified in input parameter value. If this column is already indexed, an error will be returned.
  • delete_index – Deletes an existing index on the column name specified in input parameter value. If this column does not have indexing turned on, an error will be returned.
  • move_to_collection – Moves a table or view into a collection named input parameter value. If the collection provided is non-existent, the collection will be automatically created. If input parameter value is empty, then the table or view will be top-level.
  • protected – Sets whether the given input parameter table_name should be protected or not. The input parameter value must be either ‘true’ or ‘false’.
  • rename_table – Renames a table, view or collection to input parameter value. Has the same naming restrictions as tables.
  • ttl – Sets the time-to-live in minutes of the table, view, or collection specified in input parameter table_name.
  • add_column – Adds the column specified in input parameter value to the table specified in input parameter table_name. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively.
  • change_column – Changes type and properties of the column specified in input parameter value. Use column_type and column_properties in input parameter options to set the column’s type and properties, respectively. Note that primary key and/or shard key columns cannot be changed. All unchanging column properties must be listed for the change to take place, e.g., to add dictionary encoding to an existing ‘char4’ column, both ‘char4’ and ‘dict’ must be specified in the input parameter options map.
  • set_column_compression – Modifies the compression setting on the column specified in input parameter value to the compression type specified in compression_type.
  • delete_column – Deletes the column specified in input parameter value from the table specified in input parameter table_name.
  • create_foreign_key – Creates a foreign key specified in input parameter value using the format ‘(source_column_name [, ...]) references target_table_name(primary_key_column_name [, ...]) [as foreign_key_name]’.
  • delete_foreign_key – Deletes a foreign key. The input parameter value should be the foreign_key_name specified when creating the key or the complete string used to define it.
  • add_partition – Adds the partition specified in input parameter value, to either a range-partitioned or manual list-partitioned table.
  • remove_partition – Removes the partition specified in input parameter value (and relocates all of its data to the default partition) from either a range-partitioned or manual list-partitioned table.
  • delete_partition – Deletes the partition specified in input parameter value (and all of its data) from either a range-partitioned or manual list-partitioned table.
  • set_global_access_mode – Sets the global access mode (i.e. locking) for the table specified in input parameter table_name. Specify the access mode in input parameter value. Valid modes are ‘no_access’, ‘read_only’, ‘write_only’ and ‘read_write’.
  • refresh – Replays all the table creation commands required to create this materialized view.
  • set_refresh_method – Sets the method by which this materialized view is refreshed to the method specified in input parameter value - one of ‘manual’, ‘periodic’, ‘on_change’.
  • set_refresh_start_time – Sets the time to start periodic refreshes of this materialized view to the datetime string specified in input parameter value with format ‘YYYY-MM-DD HH:MM:SS’. Subsequent refreshes occur at the specified time + N * the refresh period.
  • set_refresh_period – Sets the time interval in seconds at which to refresh this materialized view to the value specified in input parameter value. Also, sets the refresh method to periodic if not already set.
  • remove_text_search_attributes – Removes text search attribute from all columns.
  • set_strategy_definition – Sets the tier strategy for the table and its columns to the one specified in input parameter value, replacing the existing tier strategy in its entirety. See tier strategy usage for format and tier strategy examples for examples.
value (str) –
The value of the modification, depending on input parameter action. For example, if input parameter action is add_column, this would be the column name; while the column’s definition would be covered by the column_type, column_properties, column_default_value, and add_column_expression in input parameter options. If input parameter action is ttl, it would be the number of minutes for the new TTL. If input parameter action is refresh, this field would be blank.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • column_default_value – When adding a column, set a default value for existing records. For nullable columns, the default value will be null, regardless of data type.

  • column_properties – When adding or changing a column, set the column properties (strings, separated by a comma: data, store_only, text_search, char8, int8 etc).

  • column_type – When adding or changing a column, set the column type (strings, separated by a comma: int, double, string, null etc).

  • compression_type – When setting column compression (set_column_compression for input parameter action), compression type to use: none (to use no compression) or a valid compression type. Allowed values are:

    • none
    • snappy
    • lz4
    • lz4hc

    The default value is ‘snappy’.

  • copy_values_from_column – Deprecated. Please use add_column_expression instead.

  • rename_column – When changing a column, specify new column name.

  • validate_change_column – When changing a column, validate the change before applying it. If true, then validate all values. A value too large (or too long) for the new type will prevent any change. If false, then when a value is too large or long, it will be truncated. Allowed values are:

    • true – true
    • false – false

    The default value is ‘true’.

  • update_last_access_time – Indicates whether the time-to-live (TTL) expiration countdown timer should be reset to the table’s TTL. Allowed values are:

    • true – Reset the expiration countdown timer to the table’s configured TTL.
    • false – Don’t reset the timer; expiration countdown will continue from where it is, as if the table had not been accessed.

    The default value is ‘true’.

  • add_column_expression – When adding a column, an optional expression to use for the new column’s values. Any valid expression may be used, including one containing references to existing columns in the same table.

  • strategy_definition – Optional parameter for specifying the tier strategy for the table and its columns when input parameter action is set_strategy_definition, replacing the existing tier strategy in its entirety. See tier strategy usage for format and tier strategy examples for examples. This option will be ignored if input parameter value is also specified.

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Table on which the operation was performed.
action (str) –
Modification operation that was performed.
value (str) –
The value of the modification that was performed.
type_id (str) –
return the type_id (when changing a table, a new type may be created)
type_definition (str) –
return the type_definition (when changing a table, a new type may be created)
properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)
label (str) –
return the type label (when changing a table, a new type may be created)
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
alter_table_columns(column_alterations=None, options=None)[source]

Apply various modifications to columns in a table, view. The available modifications include the following:

Create or delete an index on a particular column. This can speed up certain operations when using expressions containing equality or relational operators on indexed columns. This only applies to tables.

Manage a table’s columns–a column can be added, removed, or have its type and properties modified.

Set or unset compression for a column.

Parameters

column_alterations (list of dicts of str to str) –

list of alter table add/delete/change column requests - all for the same table.

each request is a map that includes

‘column_name’, ‘action’ and the options specific for the action,

note that the same options as in alter table

requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},

{‘column_name’:’col_1’,’action’:’add_column’,
‘type’:’int’,’default_value’:‘1’}
] The user can provide a single element

(which will be automatically promoted to a list internally) or a list.

options (dict of str to str) –
Optional parameters.

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Table on which the operation was performed.
type_id (str) –
return the type_id (when changing a table, a new type may be created)
type_definition (str) –
return the type_definition (when changing a table, a new type may be created)
properties (dict of str to lists of str) –
return the type properties (when changing a table, a new type may be created)
label (str) –
return the type label (when changing a table, a new type may be created)
column_alterations (list of dicts of str to str) –

list of alter table add/delete/change column requests - all for the same table.

each request is a map that includes

‘column_name’, ‘action’ and the options specific for the action,

note that the same options as in alter table

requests but in the same map as the column name and the action. For example: [{‘column_name’:’col_1’,’action’:’change_column’,’rename_column’:’col_2’},

{‘column_name’:’col_1’,’action’:’add_column’,
‘type’:’int’,’default_value’:‘1’}
]
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
append_records(source_table_name=None, field_map=None, options={})[source]

Append (or insert) all records from a source table (specified by input parameter source_table_name) to a particular target table (specified by input parameter table_name). The field map (specified by input parameter field_map) holds the user specified map of target table column names with their mapped source column names.

Parameters

source_table_name (str) –
The source table name to get records from. Must be an existing table name.
field_map (dict of str to str) –
Contains the mapping of column names from the target table (specified by input parameter table_name) as the keys, and corresponding column names or expressions (e.g., ‘col_name+1’) from the source table (specified by input parameter source_table_name). Must be existing column names in source table and target table, and their types must be matched. For details on using expressions, see Expressions.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • offset – A positive integer indicating the number of initial results to skip from input parameter source_table_name. Default is 0. The minimum allowed value is 0. The maximum allowed value is MAX_INT. The default value is ‘0’.

  • limit – A positive integer indicating the maximum number of results to be returned from input parameter source_table_name. Or END_OF_SET (-9999) to indicate that the max number of results should be returned. The default value is ‘-9999’.

  • expression – Optional filter expression to apply to the input parameter source_table_name. The default value is ‘’.

  • order_by – Comma-separated list of the columns to be sorted by from source table (specified by input parameter source_table_name), e.g., ‘timestamp asc, x desc’. The order_by columns do not have to be present in input parameter field_map. The default value is ‘’.

  • update_on_existing_pk – Specifies the record collision policy for inserting the source table records (specified by input parameter source_table_name) into the target table (specified by input parameter table_name) table with a primary key. If set to true, any existing target table record with primary key values that match those of a source table record being inserted will be replaced by that new record. If set to false, any existing target table record with primary key values that match those of a source table record being inserted will remain unchanged and the new record discarded. If the specified table does not have a primary key, then this option is ignored. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • truncate_strings – If set to true, it allows inserting longer strings into smaller charN string columns by truncating the longer strings to fit. Allowed values are:

    • true
    • false

    The default value is ‘false’.

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str)

info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
clear_statistics(column_name='', options={})[source]

Clears statistics (cardinality, mean value, etc.) for a column in a specified table.

Parameters

column_name (str) –
Name of the column in input parameter table_name for which to clear statistics. The column must be from an existing table. An empty string clears statistics for all columns in the table. The default value is ‘’.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Value of input parameter table_name.
column_name (str) –
Value of input parameter column_name.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
clear(authorization='', options={})[source]

Clears (drops) one or all tables in the database cluster. The operation is synchronous meaning that the table will be cleared before the function returns. The response payload returns the status of the operation along with the name of the table that was cleared.

Parameters

authorization (str) –
No longer used. User can pass an empty string. The default value is ‘’.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • no_error_if_not_exists – If true and if the table specified in input parameter table_name does not exist no error is returned. If false and if the table specified in input parameter table_name does not exist then an error is returned. Allowed values are:

    • true
    • false

    The default value is ‘false’.

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Value of input parameter table_name for a given table, or ‘ALL CLEARED’ in case of clearing all tables.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
collect_statistics(column_names=None, options={})[source]

Collect statistics for a column(s) in a specified table.

Parameters

column_names (list of str) –
List of one or more column names in input parameter table_name for which to collect statistics (cardinality, mean value, etc.). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Value of input parameter table_name.
column_names (list of str) –
Value of input parameter column_names.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
create_projection(column_names=None, options={}, projection_name=None)[source]

Creates a new projection of an existing table. A projection represents a subset of the columns (potentially including derived columns) of a table.

For projection details and examples, see Projections. For limitations, see Projection Limitations and Cautions.

Window functions, which can perform operations like moving averages, are available through this endpoint as well as get_records_by_column().

A projection can be created with a different shard key than the source table. By specifying shard_key, the projection will be sharded according to the specified columns, regardless of how the source table is sharded. The source table can even be unsharded or replicated.

If input parameter table_name is empty, selection is performed against a single-row virtual table. This can be useful in executing temporal (NOW()), identity (USER()), or constant-based functions (GEODIST(-77.11, 38.88, -71.06, 42.36)).

Parameters

column_names (list of str) –
List of columns from input parameter table_name to be included in the projection. Can include derived columns. Can be specified as aliased via the syntax ‘column_name as alias’. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection to which the projection is to be assigned as a child. If the collection provided is non-existent, the collection will be automatically created. If empty, then the projection will be at the top level. The default value is ‘’.

  • expression – An optional filter expression to be applied to the source table prior to the projection. The default value is ‘’.

  • is_replicated – If true then the projection will be replicated even if the source table is not. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • limit – The number of records to keep. The default value is ‘’.

  • order_by – Comma-separated list of the columns to be sorted by; e.g. ‘timestamp asc, x desc’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

  • materialize_on_gpu – If true then the columns of the projection will be cached on the GPU. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • chunk_size – Indicates the chunk size to be used for this table.

  • create_indexes – Comma-separated list of columns on which to create indexes on the output table. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name.

  • ttl – Sets the TTL of the projection specified in input parameter projection_name.

  • shard_key – Comma-separated list of the columns to be sharded on; e.g. ‘column1, column2’. The columns specified must be present in input parameter column_names. If any alias is given for any column name, the alias must be used, rather than the original column name. The default value is ‘’.

  • persist – If true, then the projection specified in input parameter projection_name will be persisted and will not expire unless a ttl is specified. If false, then the projection will be an in-memory table and will expire unless a ttl is specified otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • preserve_dict_encoding – If true, then columns that were dict encoded in the source table will be dict encoded in the projection table. Allowed values are:

    • true
    • false

    The default value is ‘true’.

  • view_id – view this projection is part of. The default value is ‘’.

projection_name (str) –
Name of the projection to be created. Has the same naming restrictions as tables.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
create_table_monitor(options={})[source]

Creates a monitor that watches for table modification events such as insert, update or delete on a particular table (identified by input parameter table_name) and forwards event notifications to subscribers via ZMQ. After this call completes, subscribe to the returned output parameter topic_id on the ZMQ table monitor port (default 9002). Each time a modification operation on the table completes, a multipart message is published for that topic; the first part contains only the topic ID, and each subsequent part contains one binary-encoded Avro object that corresponds to the event and can be decoded using output parameter type_schema. The monitor will continue to run (regardless of whether or not there are any subscribers) until deactivated with clear_table_monitor().

Parameters

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • event

Returns

The response from the server which is a dict containing the –

following entries–

topic_id (str) –
The ZMQ topic ID to subscribe to for inserted records.
table_name (str) –
Value of input parameter table_name.
type_schema (str) –
JSON Avro schema of the table, for use in decoding published records.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
delete_records(expressions=None, options={})[source]

Deletes record(s) matching the provided criteria from the given table. The record selection criteria can either be one or more input parameter expressions (matching multiple records), a single record identified by record_id options, or all records when using delete_all_records. Note that the three selection criteria are mutually exclusive. This operation cannot be run on a collection or a view. The operation is synchronous meaning that a response will not be available until the request is completely processed and all the matching records are deleted.

Parameters

expressions (list of str) –
A list of the actual predicates, one for each select; format should follow the guidelines provided here. Specifying one or more input parameter expressions is mutually exclusive to specifying record_id in the input parameter options. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • global_expression – An optional global expression to reduce the search space of the input parameter expressions. The default value is ‘’.

  • record_id – A record ID identifying a single record, obtained at the time of insertion of the record or by calling get_records_from_collection() with the return_record_ids option. This option cannot be used to delete records from replicated tables.

  • delete_all_records – If set to true, all records in the table will be deleted. If set to false, then the option is effectively ignored. Allowed values are:

    • true
    • false

    The default value is ‘false’.

Returns

The response from the server which is a dict containing the –

following entries–

count_deleted (long) –
Total number of records deleted across all expressions.
counts_deleted (list of longs) –
Total number of records deleted per expression.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
filter(expression=None, options={}, view_name='')[source]

Filters data based on the specified expression. The results are stored in a result set with the given input parameter view_name.

For details see Expressions.

The response message contains the number of points for which the expression evaluated to be true, which is equivalent to the size of the result view.

Parameters

expression (str) –
The select expression to filter the specified table. For details see Expressions.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
  • view_id – view this filtered-view is part of. The default value is ‘’.
  • ttl – Sets the TTL of the view specified in input parameter view_name.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_area(x_column_name=None, x_vector=None, y_column_name=None, y_vector=None, options={}, view_name='')[source]

Calculates which objects from a table are within a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

x_column_name (str) –
Name of the column containing the x values to be filtered.
x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
y_column_name (str) –
Name of the column containing the y values to be filtered.
y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_area_geometry(column_name=None, x_vector=None, y_vector=None, options={}, view_name='')[source]

Calculates which geospatial geometry objects from a table intersect a named area of interest (NAI/polygon). The operation is synchronous, meaning that a response will not be returned until all the matching objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input NAI restriction specification is created with the name input parameter view_name passed in as part of the input.

Parameters

column_name (str) –
Name of the geospatial geometry column to be filtered.
x_vector (list of floats) –
List of x coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
y_vector (list of floats) –
List of y coordinates of the vertices of the polygon representing the area to be filtered. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Must not be an already existing collection, table or view. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_box(x_column_name=None, min_x=None, max_x=None, y_column_name=None, min_y=None, max_y=None, options={}, view_name='')[source]

Calculates how many objects within the given table lie in a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

x_column_name (str) –
Name of the column on which to perform the bounding box query. Must be a valid numeric column.
min_x (float) –
Lower bound for the column chosen by input parameter x_column_name. Must be less than or equal to input parameter max_x.
max_x (float) –
Upper bound for input parameter x_column_name. Must be greater than or equal to input parameter min_x.
y_column_name (str) –
Name of a column on which to perform the bounding box query. Must be a valid numeric column.
min_y (float) –
Lower bound for input parameter y_column_name. Must be less than or equal to input parameter max_y.
max_y (float) –
Upper bound for input parameter y_column_name. Must be greater than or equal to input parameter min_y.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
Optional name of the result view that will be created containing the results of the query. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_box_geometry(column_name=None, min_x=None, max_x=None, min_y=None, max_y=None, options={}, view_name='')[source]

Calculates which geospatial geometry objects from a table intersect a rectangular box. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set which satisfies the input NAI restriction specification is also created when a input parameter view_name is passed in as part of the input payload.

Parameters

column_name (str) –
Name of the geospatial geometry column to be filtered.
min_x (float) –
Lower bound for the x-coordinate of the rectangular box. Must be less than or equal to input parameter max_x.
max_x (float) –
Upper bound for the x-coordinate of the rectangular box. Must be greater than or equal to input parameter min_x.
min_y (float) –
Lower bound for the y-coordinate of the rectangular box. Must be less than or equal to input parameter max_y.
max_y (float) –
Upper bound for the y-coordinate of the rectangular box. Must be greater than or equal to input parameter min_y.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
Optional name of the result view that will be created containing the results of the query. Must not be an already existing collection, table or view. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_geometry(column_name=None, input_wkt='', operation=None, options={}, view_name='')[source]

Applies a geometry filter against a geospatial geometry column in a given table, collection or view. The filtering geometry is provided by input parameter input_wkt.

Parameters

column_name (str) –
Name of the column to be used in the filter. Must be a geospatial geometry column.
input_wkt (str) –
A geometry in WKT format that will be used to filter the objects in input parameter table_name. The default value is ‘’.
operation (str) –

The geometric filtering operation to perform Allowed values are:

  • contains – Matches records that contain the given WKT in input parameter input_wkt, i.e. the given WKT is within the bounds of a record’s geometry.
  • crosses – Matches records that cross the given WKT.
  • disjoint – Matches records that are disjoint from the given WKT.
  • equals – Matches records that are the same as the given WKT.
  • intersects – Matches records that intersect the given WKT.
  • overlaps – Matches records that overlap the given WKT.
  • touches – Matches records that touch the given WKT.
  • within – Matches records that are within the given WKT.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_list(column_values_map=None, options={}, view_name='')[source]

Calculates which records from a table have values in the given list for the corresponding column. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input filter specification is also created if a input parameter view_name is passed in as part of the request.

For example, if a type definition has the columns ‘x’ and ‘y’, then a filter by list query with the column map {“x”:[“10.1”, “2.3”], “y”:[“0.0”, “-31.5”, “42.0”]} will return the count of all data points whose x and y values match both in the respective x- and y-lists, e.g., “x = 10.1 and y = 0.0”, “x = 2.3 and y = -31.5”, etc. However, a record with “x = 10.1 and y = -31.5” or “x = 2.3 and y = 0.0” would not be returned because the values in the given lists do not correspond.

Parameters

column_values_map (dict of str to lists of str) –
List of values for the corresponding column in the table
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.

  • filter_mode – String indicating the filter mode, either ‘in_list’ or ‘not_in_list’. Allowed values are:

    • in_list – The filter will match all items that are in the provided list(s).
    • not_in_list – The filter will match all items that are not in the provided list(s).

    The default value is ‘in_list’.

view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_radius(x_column_name=None, x_center=None, y_column_name=None, y_center=None, radius=None, options={}, view_name='')[source]

Calculates which objects from a table lie within a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

For track data, all track points that lie within the circle plus one point on either side of the circle (if the track goes beyond the circle) will be included in the result.

Parameters

x_column_name (str) –
Name of the column to be used for the x-coordinate (the longitude) of the center.
x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.
y_column_name (str) –
Name of the column to be used for the y-coordinate-the latitude-of the center.
y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.
radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_radius_geometry(column_name=None, x_center=None, y_center=None, radius=None, options={}, view_name='')[source]

Calculates which geospatial geometry objects from a table intersect a circle with the given radius and center point (i.e. circular NAI). The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new resultant set (view) which satisfies the input circular NAI restriction specification is also created if a input parameter view_name is passed in as part of the request.

Parameters

column_name (str) –
Name of the geospatial geometry column to be filtered.
x_center (float) –
Value of the longitude of the center. Must be within [-180.0, 180.0]. The minimum allowed value is -180. The maximum allowed value is 180.
y_center (float) –
Value of the latitude of the center. Must be within [-90.0, 90.0]. The minimum allowed value is -90. The maximum allowed value is 90.
radius (float) –
The radius of the circle within which the search will be performed. Must be a non-zero positive value. It is in meters; so, for example, a value of ‘42000’ means 42 km. The minimum allowed value is 0. The maximum allowed value is MAX_INT.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Must not be an already existing collection, table or view. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_range(column_name=None, lower_bound=None, upper_bound=None, options={}, view_name='')[source]

Calculates which objects from a table have a column that is within the given bounds. An object from the table identified by input parameter table_name is added to the view input parameter view_name if its column is within [input parameter lower_bound, input parameter upper_bound] (inclusive). The operation is synchronous. The response provides a count of the number of objects which passed the bound filter. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

For track objects, the count reflects how many points fall within the given bounds (which may not include all the track points of any given track).

Parameters

column_name (str) –
Name of a column on which the operation would be applied.
lower_bound (float) –
Value of the lower bound (inclusive).
upper_bound (float) –
Value of the upper bound (inclusive).
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_series(track_id=None, target_track_ids=None, options={}, view_name='')[source]

Filters objects matching all points of the given track (works only on track type data). It allows users to specify a particular track to find all other points in the table that fall within specified ranges-spatial and temporal-of all points of the given track. Additionally, the user can specify another track to see if the two intersect (or go close to each other within the specified ranges). The user also has the flexibility of using different metrics for the spatial distance calculation: Euclidean (flat geometry) or Great Circle (spherical geometry to approximate the Earth’s surface distances). The filtered points are stored in a newly created result set. The return value of the function is the number of points in the resultant set (view).

This operation is synchronous, meaning that a response will not be returned until all the objects are fully available.

Parameters

track_id (str) –
The ID of the track which will act as the filtering points. Must be an existing track within the given table.
target_track_ids (list of str) –
Up to one track ID to intersect with the “filter” track. If any provided, it must be an valid track ID within the given set. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
  • spatial_radius – A positive number passed as a string representing the radius of the search area centered around each track point’s geospatial coordinates. The value is interpreted in meters. Required parameter.
  • time_radius – A positive number passed as a string representing the maximum allowable time difference between the timestamps of a filtered object and the given track’s points. The value is interpreted in seconds. Required parameter.
  • spatial_distance_metric – A string representing the coordinate system to use for the spatial search criteria. Acceptable values are ‘euclidean’ and ‘great_circle’. Optional parameter; default is ‘euclidean’. Allowed values are:
    • euclidean
    • great_circle
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_string(expression=None, mode=None, column_names=None, options={}, view_name='')[source]

Calculates which objects from a table, collection, or view match a string expression for the given string columns. The options ‘case_sensitive’ can be used to modify the behavior for all modes except ‘search’. For ‘search’ mode details and limitations, see Full Text Search.

Parameters

expression (str) –
The expression with which to filter the table.
mode (str) –

The string filtering mode to apply. See below for details. Allowed values are:

  • search – Full text search query with wildcards and boolean operators. Note that for this mode, no column can be specified in input parameter column_names; all string columns of the table that have text search enabled will be searched.
  • equals – Exact whole-string match (accelerated).
  • contains – Partial substring match (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
  • starts_with – Strings that start with the given expression (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
  • regex – Full regular expression search (not accelerated). If the column is a string type (non-charN) and the number of records is too large, it will return 0.
column_names (list of str) –
List of columns on which to apply the filter. Ignored for ‘search’ mode. The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.

  • case_sensitive – If ‘false’ then string filtering will ignore case. Does not apply to ‘search’ mode. Allowed values are:

    • true
    • false

    The default value is ‘true’.

view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_table(column_name=None, source_table_name=None, source_table_column_name=None, options={}, view_name='')[source]

Filters objects in one table based on objects in another table. The user must specify matching column types from the two tables (i.e. the target table from which objects will be filtered and the source table based on which the filter will be created); the column names need not be the same. If a input parameter view_name is specified, then the filtered objects will then be put in a newly created view. The operation is synchronous, meaning that a response will not be returned until all objects are fully available in the result view. The return value contains the count (i.e. the size) of the resulting view.

Parameters

column_name (str) –
Name of the column by whose value the data will be filtered from the table designated by input parameter table_name.
source_table_name (str) –
Name of the table whose data will be compared against in the table called input parameter table_name. Must be an existing table.
source_table_column_name (str) –
Name of the column in the input parameter source_table_name whose values will be used as the filter for table input parameter table_name. Must be a geospatial geometry column if in ‘spatial’ mode; otherwise, Must match the type of the input parameter column_name.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.

  • filter_mode – String indicating the filter mode, either in_table or not_in_table. Allowed values are:

    • in_table
    • not_in_table

    The default value is ‘in_table’.

  • mode – Mode - should be either spatial or normal. Allowed values are:

    • normal
    • spatial

    The default value is ‘normal’.

  • buffer – Buffer size, in meters. Only relevant for spatial mode. The default value is ‘0’.

  • buffer_method – Method used to buffer polygons. Only relevant for spatial mode. Allowed values are:

    • geos – Use geos 1 edge per corner algorithm

    The default value is ‘normal’.

  • max_partition_size – Maximum number of points in a partition. Only relevant for spatial mode. The default value is ‘0’.

  • max_partition_score – Maximum number of points * edges in a partition. Only relevant for spatial mode. The default value is ‘8000000’.

  • x_column_name – Name of column containing x value of point being filtered in spatial mode. The default value is ‘x’.

  • y_column_name – Name of column containing y value of point being filtered in spatial mode. The default value is ‘y’.

view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
filter_by_value(is_string=None, value=0, value_str='', column_name=None, options={}, view_name='')[source]

Calculates which objects from a table has a particular value for a particular column. The input parameters provide a way to specify either a String or a Double valued column and a desired value for the column on which the filter is performed. The operation is synchronous, meaning that a response will not be returned until all the objects are fully available. The response payload provides the count of the resulting set. A new result view which satisfies the input filter restriction specification is also created with a view name passed in as part of the input payload. Although this functionality can also be accomplished with the standard filter function, it is more efficient.

Parameters

is_string (bool) –
Indicates whether the value being searched for is string or numeric.
value (float) –
The value to search for. The default value is 0.
value_str (str) –
The string value to search for. The default value is ‘’.
column_name (str) –
Name of a column on which the filter by value would be applied.
options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • collection_name – Name of a collection which is to contain the newly created view. If the collection provided is non-existent, the collection will be automatically created. If empty, then the newly created view will be top-level.
view_name (str) –
If provided, then this will be the name of the view containing the results. Has the same naming restrictions as tables. The default value is ‘’.

Returns

A read-only GPUdbTable object.

Raises

GPUdbException – –
Upon an error from the server.
lock_table(lock_type='status', options={})[source]

Manages global access to a table’s data. By default a table has a input parameter lock_type of read_write, indicating all operations are permitted. A user may request a read_only or a write_only lock, after which only read or write operations, respectively, are permitted on the table until the lock is removed. When input parameter lock_type is no_access then no operations are permitted on the table. The lock status can be queried by setting input parameter lock_type to status.

Parameters

lock_type (str) –

The type of lock being applied to the table. Setting it to status will return the current lock status of the table without changing it. Allowed values are:

  • status – Show locked status
  • no_access – Allow no read/write operations
  • read_only – Allow only read operations
  • write_only – Allow only write operations
  • read_write – Allow all read/write operations

The default value is ‘status’.

options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

lock_type (str) –
Returns the lock state of the table.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
show_table(options={})[source]

Retrieves detailed information about a table, view, or collection, specified in input parameter table_name. If the supplied input parameter table_name is a collection, the call can return information about either the collection itself or the tables and views it contains. If input parameter table_name is empty, information about all collections and top-level tables and views can be returned.

If the option get_sizes is set to true, then the sizes (objects and elements) of each table are returned (in output parameter sizes and output parameter full_sizes), along with the total number of objects in the requested table (in output parameter total_size and output parameter total_full_size).

For a collection, setting the show_children option to false returns only information about the collection itself; setting show_children to true returns a list of tables and views contained in the collection, along with their corresponding detail.

To retrieve a list of every table, view, and collection in the database, set input parameter table_name to ‘*’ and show_children to true.

Parameters

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • force_synchronous – If true then the table sizes will wait for read lock before returning. Allowed values are:

    • true
    • false

    The default value is ‘true’.

  • get_sizes – If true then the table sizes will be returned; blank, otherwise. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • show_children – If input parameter table_name is a collection, then true will return information about the children of the collection, and false will return information about the collection itself. If input parameter table_name is a table or view, show_children must be false. If input parameter table_name is empty, then show_children must be true. Allowed values are:

    • true
    • false

    The default value is ‘true’.

  • no_error_if_not_exists – If false will return an error if the provided input parameter table_name does not exist. If true then it will return an empty result. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • get_column_info – If true then column info (memory usage, etc) will be returned. Allowed values are:

    • true
    • false

    The default value is ‘false’.

Returns

The response from the server which is a dict containing the –

following entries–

table_name (str) –
Value of input parameter table_name.
table_names (list of str) –
If input parameter table_name is a table or view, then the single element of the array is input parameter table_name. If input parameter table_name is a collection and show_children is set to true, then this array is populated with the names of all tables and views contained by the given collection; if show_children is false then this array will only include the collection name itself. If input parameter table_name is an empty string, then the array contains the names of all collections and top-level tables.
table_descriptions (list of lists of str) –

List of descriptions for the respective tables in output parameter table_names. Allowed values are:

  • COLLECTION
  • VIEW
  • REPLICATED
  • JOIN
  • RESULT_TABLE
type_ids (list of str) –
Type ids of the respective tables in output parameter table_names.
type_schemas (list of str) –
Type schemas of the respective tables in output parameter table_names.
type_labels (list of str) –
Type labels of the respective tables in output parameter table_names.
properties (list of dicts of str to lists of str) –
Property maps of the respective tables in output parameter table_names.
additional_info (list of dicts of str to str) –

Additional information about the respective tables in output parameter table_names. Allowed values are:

  • @INNER_STRUCTURE
sizes (list of longs) –
Empty array if the get_sizes option is false. Otherwise, sizes of the respective tables represented in output parameter table_names. For all but track data types, this is simply the number of total objects in a table. For track types, since each track semantically contains many individual objects, the output parameter sizes are the counts of conceptual tracks (each of which may be associated with multiple objects).
full_sizes (list of longs) –
Empty array if the get_sizes option is false. Otherwise, number of total objects in the respective tables represented in output parameter table_names. For all but track data types, this is the same as output parameter sizes. For track types, since each track semantically contains many individual objects, output parameter full_sizes is the count of total objects.
join_sizes (list of floats) –
Empty array if the get_sizes option is false. Otherwise, number of unfiltered objects in the cross product of the sub-tables in the joined-tables represented in output parameter table_names. For simple tables, this number will be the same as output parameter sizes. For join-tables this value gives the number of joined-table rows that must be processed by any aggregate functions operating on the table.
total_size (long) –
-1 if the get_sizes option is false. Otherwise, the sum of the elements of output parameter sizes.
total_full_size (long) –
-1 if the get_sizes option is false. The sum of the elements of output parameter full_sizes.
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
update_records(expressions=None, new_values_maps=None, records_to_insert=[], records_to_insert_str=[], record_encoding='binary', options={})[source]

Runs multiple predicate-based updates in a single call. With the list of given expressions, any matching record’s column values will be updated as provided in input parameter new_values_maps. There is also an optional ‘upsert’ capability where if a particular predicate doesn’t match any existing record, then a new record can be inserted.

Note that this operation can only be run on an original table and not on a collection or a result view.

This operation can update primary key values. By default only ‘pure primary key’ predicates are allowed when updating primary key values. If the primary key for a table is the column ‘attr1’, then the operation will only accept predicates of the form: “attr1 == ‘foo’” if the attr1 column is being updated. For a composite primary key (e.g. columns ‘attr1’ and ‘attr2’) then this operation will only accept predicates of the form: “(attr1 == ‘foo’) and (attr2 == ‘bar’)”. Meaning, all primary key columns must appear in an equality predicate in the expressions. Furthermore each ‘pure primary key’ predicate must be unique within a given request. These restrictions can be removed by utilizing some available options through input parameter options.

Parameters

expressions (list of str) –
A list of the actual predicates, one for each update; format should follow the guidelines here.
new_values_maps (list of dicts of str to str and/or None) –
List of new values for the matching records. Each element is a map with (key, value) pairs where the keys are the names of the columns whose values are to be updated; the values are the new values. The number of elements in the list should match the length of input parameter expressions.
records_to_insert (list of str) –
An optional list of new binary-avro encoded records to insert, one for each update. If one of input parameter expressions does not yield a matching record to be updated, then the corresponding element from this list will be added to the table. The default value is an empty list ( [] ).
records_to_insert_str (list of str) –
An optional list of new json-avro encoded objects to insert, one for each update, to be added to the set if the particular update did not affect any objects. The default value is an empty list ( [] ).
record_encoding (str) –

Identifies which of input parameter records_to_insert and input parameter records_to_insert_str should be used. Allowed values are:

  • binary
  • json

The default value is ‘binary’.

options (dict of str to str) –

Optional parameters. The default value is an empty dict ( {} ). Allowed keys are:

  • global_expression – An optional global expression to reduce the search space of the predicates listed in input parameter expressions. The default value is ‘’.

  • bypass_safety_checks – When set to true, all predicates are available for primary key updates. Keep in mind that it is possible to destroy data in this case, since a single predicate may match multiple objects (potentially all of records of a table), and then updating all of those records to have the same primary key will, due to the primary key uniqueness constraints, effectively delete all but one of those updated records. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • update_on_existing_pk – Can be used to customize behavior when the updated primary key value already exists as described in insert_records(). Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • update_partition – Force qualifying records to be deleted and reinserted so their partition membership will be reevaluated. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • truncate_strings – If set to true, any strings which are too long for their charN string fields will be truncated to fit. The default value is false. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • use_expressions_in_new_values_maps – When set to true, all new values in input parameter new_values_maps are considered as expression values. When set to false, all new values in input parameter new_values_maps are considered as constants. NOTE: When true, string constants will need to be quoted to avoid being evaluated as expressions. Allowed values are:

    • true
    • false

    The default value is ‘false’.

  • record_id – ID of a single record to be updated (returned in the call to insert_records() or get_records_from_collection()).

Returns

The response from the server which is a dict containing the –

following entries–

count_updated (long) –
Total number of records updated.
counts_updated (list of longs) –
Total number of records updated per predicate in input parameter expressions.
count_inserted (long) –
Total number of records inserted (due to expressions not matching any existing records).
counts_inserted (list of longs) –
Total number of records inserted per predicate in input parameter expressions (will be either 0 or 1 for each expression).
info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.
update_records_by_series(world_table_name=None, view_name='', reserved=[], options={})[source]

Updates the view specified by input parameter table_name to include full series (track) information from the input parameter world_table_name for the series (tracks) present in the input parameter view_name.

Parameters

world_table_name (str) –
Name of the table containing the complete series (track) information.
view_name (str) –
name of the view containing the series (tracks) which have to be updated. The default value is ‘’.
reserved (list of str) –
The default value is an empty list ( [] ). The user can provide a single element (which will be automatically promoted to a list internally) or a list.
options (dict of str to str) –
Optional parameters. The default value is an empty dict ( {} ).

Returns

The response from the server which is a dict containing the –

following entries–

count (int)

info (dict of str to str) –
Additional information.

Raises

GPUdbException – –
Upon an error from the server.