πŸ”°SQL Transform

Use SQL to build advanced tables for seamless data modeling and analysis

Overview

SQL Transform is a powerful⚑feature on Sprinkle that enables you to create advanced tables from the input source tables using SQL queries.

Data from various sources is imported into Sprinkle as tables. These tables can then be joined to create SQL Transform tables (tables created in SQL Transform), from which the models can be created, or these tables can directly be used to create reports for analysis.

Using SQL Transform, you can join various tables in the warehouse together, cast and change data types of columns, and create new measures as required.

Watch Video πŸ“Ί

Feature Walkthrough 🚢

Creating a SQL Transform

From the left navigation panel, click on SQL Transform to go to the transforms listing page.

The listing page displays all the transforms created by you or that have been shared with you.

The listing page has various premade filters and a search bar, making it super easy for you to find your files.

To create a new SQL transform, click on the "+ New Transform" button.

SQL Transform UI

SQL transform has mainly 4 sections:

1. Script editor

  • You can write and edit your SQL scripts here.πŸ’»'

  • You can use SQL commands to join tables to create a new table, cast the data types of the columns, etc.

  • All SQL queries are supported here.

  • Click 'Save and run' once you are done editing the script.

  • To view query results, you can select the part of the query and then click 'Run selection'. The results will appear in a new tab next to the Jobs section.

Tip: Hit ctrl+space for auto-suggestions. This will help you write queries faster.

You can also use the following:

  • Copy script feature to quickly copy the entire script.

  • Full-screen feature for an expanded workspace. This will collapse the side bar and give you more space to work with.

2. Schema Browser

  • Enables browsing through the schemas and tables in the warehouse.

  • Use the search bar to easily search for the required tables.

  • Click on the table name to view column names and data types.

  • Double-tapping on a table or column name adds it to the query editor for efficient query writing.

3. Variables

Variables come in handy if you are using the same value at multiple places in the SQL Transform script.

  • Under the local variables section, you can create and set variables to their initial values. Once a variable is created, you can use it in the script.

  • Local variables are defined and used within the current scope. Use them in the format {{variableName}} in the script.

  • You can also use variables from the global variables section. Global variables are defined by admins. You can use them in the format {{Globals.variableName}} in the script.

4. Jobs

The job section can be found below the script area.

This displays all the jobs that were run for the SQL transform and related information.

Jobs are run when you click Save and Run on the top right corner.

More Options

Additional features can be found under more options:

  • Auto-Run: Enable the Auto-Run toggle to run the SQL Transform periodically.

  • Comments: Facilitates discussion and collaboration among team members.

  • Status: Assign a status to the transform for easy identification on the listing page (WIP, verified, has errors, or is deprecated).

  • Clone: SQL transforms can be cloned (or copied) and used to build upon existing transforms.

  • Share: Share SQL transforms with users and user groups such as admin, developer, analyst, viewer, etc.

  • Activity: View user activity on the SQL transform using this feature.

  • Settings: Manage Folder, Owner, Run Settings, Variables, Location, Managed Data Loading, Incremental Processing

Settings

From the More Options tab, click on Settings to add advanced settings for your SQL Transform.

There are 3 tabs in the settings: General, Cleanup, and Data Loading.

General

This section enables you to edit the folder and owner of the SQL transform.

πŸ“ Folder: You can choose any folder that you have access to from the drop-down menu. (By default, the transform is created in the SharedSpaces folder.)

πŸ‘€Owner: Set or change the owner from the available drop-down list.

Cleanup

🧹Cleanup is useful when creating external tables.

  • Specify the location directly in the field when creating external tables.

  • The contents of the directory are deleted before executing the SQL Transform script, and the data is recreated at the location afterwards.

Data Loading

Managed Data Loading

Your SQL transform may sometimes have a DROP TABLE command in the script based on your requirements. But if there are reports that are based on the table that are getting dropped, it may end up failing. While dropping temporary tables is not a problem, dropping tables that power reports can become a problem.

Here's the solution! πŸŽ‰βœ¨

Allow Sprinkle to manage the loading of the data onto the final table while you drop and load the staging table.

A Staging table is a table where data is stored temporarily πŸ’‘

How Does it Work? πŸ€“

  • Create a staging table for temporary data storage.

  • Modify your script to load data into the staging table instead of the final table.

  • Then save the script.

  • Click on Settings and browse to the Data Loading tab.

  • Click on the 'Enable managed data loading' checkbox.

  • Specify the staging table and final table in the fields.

  • Save the settings.

Under the hood, Sprinkle takes care of loading data into the final table. In the end, the final table will look exactly like the staging table.

Incremental Processing

Incremental processing helps reduce the pipeline run time ⏳by reducing the data crunched.

This builds on the concept of managed data loading by adding a few more abstractions.

You still have to provide staging and final table names. Just like above, the final table is a view over a sprinkle-managed table. The sprinkle-managed table is not visible to you, but the view is.

When data gets loaded into the final table, instead of replacing the complete data, if you want to insert only new or changed data, then you can enable incremental processing.

For incremental processing, a time column with data in the format of yyyy-mm-dd must be present in your final table.

Sprinkle will internally maintain the checkpoint of the time column for the last successful run.

In the SQL Transform script, you can get the value of the checkpoint using the ${sprinkle_latest_checkpoint} variable.

Last updated