Version:

Compression

Compression can be applied to an individual column of any data type to reduce its memory footprint. By default, a column is stored uncompressed within memory. After compression is applied, the column remains in a compressed state until used. When the data is retrieved, a copy is temporarily uncompressed for use and then discarded when no longer being processed. When new data is added or existing data updated, the affected data segment will be uncompressed, modified, and then recompressed immediately. Compression persists through database restarts.

No functionality is disabled on compressed columns, but insert & update operations will be slower. Data retrieval operations may be slower, but should still perform well overall.

A column can be compressed via the /alter/table endpoint. You can also set column compression in GAdmin.

The compression setting determines which compression algorithm to use (if any). Each compression algorithm varies in compression ratio (ratio between uncompressed size and compressed size) and the speed at which the data is compressed and uncompressed. There are four compression settings available:

  • none (the default)
  • snappy -- high compression/decompression speed (minimum 250-500 MB/s per core), large compression ratio (~2.091)
  • lz4 -- high compression speed (minimum 400 MB/s per core) and higher decompression speed, large compression ratio (~2.101)
  • lz4hc -- slower compression speed but higher decompression speed than lz4, higher compression ratio than lz4 (~2.720)

For example, to apply Snappy compression to a column in Python:

# Set Snappy compression on the last_name column of the employee table
h_db.alter_table(
    table_name = "example.employee",
    action = "set_column_compression",
    value = "last_name",
    options = {"compression_type":"snappy"}
)

To turn off column compression in Python:

# Remove Snappy compression on the last_name column of the employee table
h_db.alter_table(
    table_name = "example.employee",
    action = "set_column_compression",
    value = "last_name",
    options = {"compression_type":"none"}
)

Limitations and Cautions

Columns with any of the following characteristics are not eligible for compression: