Oracle DB

Guide to Integrate your Oracle database with Sprinkle

Data Import Concepts

Before setting up the data import, learn about data import concepts here.

Step by Step Guide

STEP-1: Allow Oracle DB to accept connection from Sprinkle

  1. Network Connectivity:

    • If Oracle server is on public network, accessible over public IP, allow inbound connection on oracle port (default is 1521) from Sprinkle IPs (34.93.254.126, 34.93.106.136)

    • If Oracle server is on private network, configure SSH Tunnel in Advanced Settings.

  2. Create a Read-Only user, providing any name like "sprinkle"

    • Permissions required By Read-Only User in Oracle DB :

      • ANALYZE ANY

      • ANALYZE ANY DICTIONARY

      • SELECT ANY TABLE

      • SELECT ANY DICTIONARY

STEP-2: Configure Oracle DB Connection

To learn about Connection, refer here

  • Log into Sprinkle application

  • Navigate to Ingest -> Connections Tab -> Setup Connection ->

  • Select Oracle database

  • Provide all the mandatory details

    • Distinct Name: Name to identify this connection

    • Host: Provide the IP address or Host name.

    • Port: Provide the Port number.

    • Service Name: Provide Service name. You will get service name using query SELECT name FROM v$services.

    • Username

    • Password

    • Advanced Settings: If Yes:-

      • Timezone: The timezone selected will be used to fetch Datetime/Timestamp column value.

      • Connect via SSH Host: If Yes:-

        • SSH Host: IP address or hostname of the SSH server.

        • SSH Public Key: Add this public key to the ~/.ssh/authorized_keys file on the ssh host machine.

        • SSH Login Username

  • Test Connection

  • Create

STEP-3: Configure Oracle DB data import

To learn about data imports, refer here

  • Navigate to Ingest -> Data Imports Tab -> Setup Sources ->

  • Select Oracle database

  • Provide the name -> Create

  • Connection Tab:

    • From the drop-down, select the name of connection created in STEP-2

    • Update

STEP-4: Create Dataset

Datasets Tab: To learn about Dataset, refer here.

Add Dataset for each table that you want to replicate, providing following details

  • Database Name (Required) : Database name to select the table from

  • Schema Name (Required) : Schema name that is present in the database

  • Table Type: (Required) :

    • Table: Select Table if you want to replicate the table

      • Source Table (Required) : Name of the Oracle table

      • Exclude Columns (Optional): List of columns to exclude

      • Mask Columns (Optional): List of columns to be masked using SHA256

      • Mode (Required):

        • Complete: Ingest full data from the source table in every ingestion job run. Choose this option if your table size is small (<1 million rows) and you want to ingest it infrequently (few times a day)

        • Incremental: Ingest only the changed or inserted rows in every ingestion job run. Choose this option if your table size is large and you want to ingest in real time mode. This requires a time column, and unique key to be present in the table.

          • Remove Duplicate Rows: Duplicate rows are removed by taking the latest value of your unique keys

            • Unique Keys: List of columns forming the unique id

          • Time Column: Rows with newer time value column will be ingested in each run

          • Destination Table Partition: It will create partitions of the destination table according to the specified column (for Eg: Date). Its value is NO by default. If YES, destination table will be partitioned, else normal materialized table will be created. This feature is only applicable in BigQuery.

            • Partition Column: Enter the partition column name. Partition will be created on the destination table using partition column value. Selected column values should be static even if other fields are updated. This should be of Date type or easily converted into Date format. If no column is selected, Partition will be created on the basis of job run time.

            • Partition Date Format: Enter the format of date.

          • Periodic Full Fetch: If yes, Periodically full table will be fetched from source. Would be useful if data gets deleted from source and you want to keep the data in sync with source. If no, only incremental data will be pulled in every run.

          To Know more about Ingestion Modes, refer here

      • Filter clause: Filter the data while ingesting. (select * from table_name where CONDITION) only give CONDITION here.

    • Query: Select Query, if you want to insert the output of the Oracle query into the data warehouse

      • Query(SQL) (Required): Query that will be executed on Oracle in each ingestion job run. If schema is not selected as default one, provide fully qualified name(mention schema_name in query). Ex- select * from schema_name.table_name;

  • Destination Schema (Required) : Data warehouse schema where the table will be ingested into

  • Destination Table name (Required) : It is the table name to be created on the warehouse. If not given, sprinkle will create like ds_<datasourcename>_<tablename>

  • Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. Learn more on how to use this field.

  • Create

STEP-5: Run and schedule Ingestion

In the Ingestion Jobs tab:

  • Trigger the Job, using Run button

  • To schedule, enable Auto-Run. Change the frequency if needed

Last updated