MySQL DB
Guide to Integrate your MySQL database with Sprinkle
Data Import Concepts
Before setting up the data import, learn about data import concepts here
Step-by-Step Guide
STEP-1: Allow MySQL to accept connection from Sprinkle
Network Connectivity:
If MySQL server is on public network, accessible over public IP, allow inbound connection on mysql port (default is 3306) from Sprinkle IPs (34.93.254.126, 34.93.106.136)
If MySQL server is on private network, configure SSH Tunnel in Advanced Settings.
Create a Read-Only user, providing any name like "sprinkle"
STEP-2: Configure MySQL Connection
To learn about Connection, refer here
Log into the Sprinkle application
Navigate to Ingest -> Connections Tab -> Setup Connections ->
Select MySQL database
Provide all the mandatory details
Distinct Name: Name to identify this connection
Host: Provide the IP address or Host name.
Port: Provide the Port number.
Database: Provide a database name if there is any, it should be an existing database.
Username
Password
Advanced Settings: If Yes:-
Driver Version: Select the driver version.
Timezone: The timezone selected will be used to fetch Datetime/Timestamp column value.
Connection Properties: You can provide optional connection properties. ex- key1=value1&key2=value2
Change Tracking: If enabled, MySQL binlog is used to fetch changed/new rows. This setting cannot be changed later once selected. For more details see this.
Connect via SSH Host: If Yes:-
SSH Host: IP address or hostname of the SSH server.
SSH Public Key: Add this public key to the ~/.ssh/authorized_keys file on the ssh host machine.
SSH Login Username
Test Connection
Create
STEP-3: Configure MySQL data import
To learn about data imports, refer here
Navigate to Datasources -> Datasources Tab -> Add ->
Select MySQL database
Provide the name -> Create
Connection Tab:
From the drop-down, select the name of the connection created in STEP-2
Update
STEP-4: Create a Dataset
Datasets Tab: To learn about Dataset, refer here. Add a Dataset for each table that you want to replicate, providing the following details
Database Name (Required): Database name to select the table from
Table Type: (Required):
Table: Select Table if you want to replicate the table
Source Table (Required): Name of the MySQL table
Exclude Columns (Optional): List of columns to exclude
Mask Columns (Optional): List of columns to be masked using SHA256
Mode (Required):
Complete: Ingest full data from the source table in every ingestion job run. Choose this option if your table size is small (<1 million rows) and you want to ingest it infrequently (a few times a day)
Incremental: Ingest only the changed or inserted rows in every ingestion job run. Choose this option if your table size is large and you want to ingest in real-time mode. This requires a time column, and a unique key to be present in the table.
Remove Duplicate Rows: Duplicate rows are removed by taking the latest value of your unique keys
Unique Keys: List of columns forming the unique id
Fetch With Delay(In minutes): Use this to apply delay while downloading records. When a non-zero value is specified, records inserted/updated on the source before given minutes will be ingested, instead of data till the current time. Useful when current data is not fully reflected on the source due to heavy loads. It can be left at 0 minutes if not needed.
Time Column: Rows with newer time value column will be ingested in each run
Destination Table Partition: It will create partitions of the destination table according to the specified column (for Eg: Date). Its value is NO by default. If YES, the destination table will be partitioned, else normal materialized table will be created. This feature is only applicable in BigQuery.
Partition Column: Enter the partition column name. A partition will be created on the destination table using the partition column value. Selected column values should be static even if other fields are updated. This should be of Date type or easily converted into Date format. If no column is selected, Partition will be created on the basis of job run time.
Partition Date Format: Enter the format of date.
Periodic Full Fetch: If yes, Periodically full table will be fetched from the source. Would be useful if data gets deleted from the source and you want to keep the data in sync with the source. If not, only incremental data will be pulled in every run.
To Know more about Ingestion Modes, refer here
Filter clause: Filter the data while ingesting. (select * from table_name where CONDITION) only give CONDITION here.
Query: Select Query, if you want to insert the output of the MySQL query into the data warehouse
Query(SQL) (Required): Query that will be executed on MySQL in each ingestion job run. If a schema is not selected as the default one, provide a fully qualified name(mention schema_name in the query). Ex- select * from schema_name.table_name;
Destination Schema (Required): Data warehouse schema where the table will be ingested into
Destination Table name (Required): It is the table name to be created in the warehouse. If not given, sprinkle will create like ds_<datasourcename>_<tablename>
Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. Learn more on how to use this field.
Create
STEP-5: Run and schedule Ingestion
In the Ingestion Jobs tab:
Trigger the Job, using the Run button
To schedule, enable Auto-Run. Change the frequency if needed
Last updated