Destination Create Table Clause

Overview of clauses that can be used in the Destination Create Table Clause field while adding tables to the destination (warehouse) through Data Imports in Sprinkle.

The destination-create table clause in Sprinkle enables you to provide additional clauses for warehouse-create table queries such as clustering, partitioning, and more depending on the warehouse. It is useful for optimizing DML (Data Manipulation Language) statements to ensure optimal query performance.

Features like partitioning, Clustering, and more can be used by providing these clauses in the Destination Create Table Clause. Let's understand each in detail.

➡️ Clustering a table refers to sorting data based on the columns provided by the user which is highly beneficial in boosting query performance.

Clustering can be used when:

  1. If your query has multiple filtering columns and they often filter on some particular columns.

  2. If the filtered columns have too many unique values then clustering should be preferred.

  3. It is hard to decipher query cost estimation while clustering so if you don’t have any cost considerations clustering is the best option.

➡️ Partitioning is dividing a table into different segments that help in easy management and efficient querying of your data.

Partitioning can be used when :

  1. When the user needs to manage data in a particular partition.

  2. If a user wants to know the query cost before the query is run then in that case partition can be used.

➡️ There are more such clauses that can be useful and give you granular control while creating tables in the warehouse. Those are explained below in context to the type of data warehouse.

BIGQUERY

Syntax

PARTITION BY partition_expression
CLUSTER BY clustering_column_list

[PARTITION BY partition_expression]

Partition by is an optional clause that helps to create a partition by the column name that is mentioned by the user. It has many partition expressions to know more about them click here

[CLUSTER BY clustering_column_list]

Cluster by is also an optional clause that creates clusters on the basis of the cluster columns specified. To know more about the clustering column list, click here.

To know more about how to create a table in BigQuery, click here

ATHENA

The destination create table clause is not supported in Athena.

SNOWFLAKE

Syntax

CLUSTER BY ( <expr> [ , <expr> , ... ] ) 

[ CLUSTER BY ( <expr> [ , <expr> , ... ] ) ]

In cluster by one or more columns can be added as the clustering key on the basis of which clustering needs to be done.

To know more about how to create a table in Snowflake, click here

REDSHIFT

Syntax

DISTSTYLE { AUTO | EVEN | KEY | ALL } 
DISTKEY ( column_name )
[COMPOUND | INTERLEAVED ] SORTKEY ( column_name [,...]) |  [ SORTKEY AUTO ]
ENCODE AUTO

[ DISTSTYLE { AUTO | EVEN | KEY | ALL } ]

Diststyle specifies the data distribution style for the complete table and helps in the arrangement of data to locate it easily. The default distribution style is Auto but according to requirements, users can use other distribution styles as well.

There are four distribution styles in redshift:

  1. Auto: It is the default distribution style selected by redshift that primarily depends on the size of the incoming data.

  2. Even: In this, data is evenly distributed by a leading node to other nodes and is the simplest distribution style.

  3. Key: This distribution style is selected when the table undergoes many joining operations or complex querying. It helps in ensuring optimal query performance.

  4. All: In this, a copy of every table is distributed among all the nodes, so this distribution style is generally preferred for tables small in size to save storage space on the redshift cluster.

[ DISTKEY ( column_name ) ]

In distkey users needs to specify the name of the column on the basis of which distribution style is to be applied.

[ [COMPOUND | INTERLEAVED ] SORTKEY

In this users need to specify one or more columns that are to be selected as the sortkey. The sortkey determines the sorting order of data once it is loaded. Additionally, there are two sort styles offered by redshift for more information click here

[ ENCODE AUTO ]

Encode auto is the default encoding type on any column that is used to boost query performance in amazon redshift. For more information on encoding click here

To know more about how to create a table in Redshift, click here

Last updated