Azure Cosmos DB
Guide to integrate your Azure Cosmos DB with Sprinkle
Datasource Concepts
Before setting up the datasource, learn about datasource concepts here.
Step-by-Step Guide
Step 1: Configure the Cosmos DB Connection
To learn about Connection, refer here.
Log into the Sprinkle application.
Navigate to Datasources -> Connections Tab -> New Connection
Select CosmosDB
Provide all the mandatory details
Name: Name to identify this connection
Account Endpoint: Provide the URL in the following format:
https://xxxxxxxxxx.documents.azure.com:443/
Master Key
Test Connection
Create
Step 2: Configure Cosmos DB datasource
To learn about datasource, refer here
Navigate to Datasources -> Datasources Tab -> Add ->
Select CosmosDB
Provide the name -> Create
Connection Tab:
From the drop-down, select the name of connection created in STEP-2
Update
STEP-3: Create Dataset
Datasets Tab: To learn about Dataset, refer here.
Add Dataset for each collection that you want to replicate, providing following details:
Database Id (Required)
Collection Id (Required)
Ingestion Mode: (Required)
Complete: Ingest full data from the source table in every ingestion job run. Choose this option if your table size is small (<1 million rows) and you want to ingest it infrequently (few times a day)
Incremental: Ingest only the changed or inserted rows in every ingestion job run. Choose this option if your table size is large and you want to ingest in realtime mode. Requires Unique Id
Unique key (Required)
To Know more about Ingestion Modes, refer here
Automatic Schema (Required):
Yes: Schema is automatically discovered by Sprinkle (Recommended)
No: Hive Schema to be provided Format for Hive schema is : Col1 datatype, Col2 datatype,Col3 datatype Datatype should be warehouse specific.
Date Type: Ingestion runs from this start date/days. If Incremental, then only first run pulls from this date, further runs only pulls changes/new rows.
Start Date: Provide in the Format:YYYY-MM-DD
No of days
Destination Schema (Required) : Data warehouse schema where the table will be ingested into
Destination Table name (Required) : It is the table name to be created in the warehouse. If not given, sprinkle will create like ds_<datasourcename>_<tablename>
Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. Learn more on how to use this field.
Create
STEP-4: Run and schedule Ingestion
In the Ingestion Jobs tab:
Trigger the Job, using Run button
To schedule, enable Auto-Run. Change the frequency if needed
Last updated