Azure Cosmos DB

Guide to integrate your Azure Cosmos DB with Sprinkle

Datasource Concepts

Before setting up the datasource, learn about datasource concepts here.

Step-by-Step Guide

Step 1: Configure the Cosmos DB Connection

To learn about Connection, refer here.

  • Log into the Sprinkle application.

  • Navigate to Datasources -> Connections Tab -> New Connection

  • Select CosmosDB

  • Provide all the mandatory details

    • Name: Name to identify this connection

    • Account Endpoint: Provide the URL in the following format:

    https://xxxxxxxxxx.documents.azure.com:443/

    • Master Key

  • Test Connection

  • Create

Step 2: Configure Cosmos DB datasource

To learn about datasource, refer here

  • Navigate to Datasources -> Datasources Tab -> Add ->

  • Select CosmosDB

  • Provide the name -> Create

  • Connection Tab:

    • From the drop-down, select the name of connection created in STEP-2

    • Update

STEP-3: Create Dataset

Datasets Tab: To learn about Dataset, refer here.

Add Dataset for each collection that you want to replicate, providing following details:

  • Database Id (Required)

  • Collection Id (Required)

  • Ingestion Mode: (Required)

    • Complete: Ingest full data from the source table in every ingestion job run. Choose this option if your table size is small (<1 million rows) and you want to ingest it infrequently (few times a day)

    • Incremental: Ingest only the changed or inserted rows in every ingestion job run. Choose this option if your table size is large and you want to ingest in realtime mode. Requires Unique Id

      • Unique key (Required)

      To Know more about Ingestion Modes, refer here

  • Automatic Schema (Required):

    • Yes: Schema is automatically discovered by Sprinkle (Recommended)

    • No: Hive Schema to be provided Format for Hive schema is : Col1 datatype, Col2 datatype,Col3 datatype Datatype should be warehouse specific.

  • Date Type: Ingestion runs from this start date/days. If Incremental, then only first run pulls from this date, further runs only pulls changes/new rows.

    • Start Date: Provide in the Format:YYYY-MM-DD

    • No of days

  • Destination Schema (Required) : Data warehouse schema where the table will be ingested into

  • Destination Table name (Required) : It is the table name to be created in the warehouse. If not given, sprinkle will create like ds_<datasourcename>_<tablename>

  • Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. Learn more on how to use this field.

  • Create

STEP-4: Run and schedule Ingestion

In the Ingestion Jobs tab:

  • Trigger the Job, using Run button

  • To schedule, enable Auto-Run. Change the frequency if needed

Last updated