> For the complete documentation index, see [llms.txt](https://docs.sprinkledata.com/product/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines/databases/azure-cosmos-db.md).

# Azure Cosmos DB

## Pipeline Concepts

Before setting up the Pipeline, learn about Pipeline concepts [here](/product/ingesting-your-data/pipelines.md).

## Step-by-Step Guide

### Step 1: Configure the Cosmos DB Connection

To learn about Connection, refer [here](/product/ingesting-your-data/pipelines.md).

* Log into the Sprinkle application.
* Navigate to Ingest -> Connections Tab -> New Connection
* Select CosmosDB
* Provide all the mandatory details

  * ***Name***: Name to identify this connection
  * ***Account Endpoint**:* Provide the URL in the following format:&#x20;

  &#x20;       <https://xxxxxxxxxx.documents.azure.com:443/>

  * *Master Key*
* Test Connection&#x20;
* Create

### Step 2: Configure Cosmos DB Pipeline

To learn about Pipeline, refer [here](/product/ingesting-your-data/pipelines.md)

* Navigate to Ingest -> Pipeline Tab -> Add ->&#x20;
* Select CosmosDB
* Provide the name -> Create
* **Connection Tab**:&#x20;
  * From the drop-down, select the name of connection created in STEP-2
  * Update

### STEP-3: Create Dataset

**Datasets Tab**: To learn about Dataset, refer [here](/product/ingesting-your-data/pipelines.md).&#x20;

Add Dataset for each collection that you want to replicate, providing following details:

* *Database Id* (Required)
* *Collection Id* (Required)
* *Ingestion Mode*: (Required)&#x20;
  * *Complete*: Ingest full data from the source table in every ingestion job run. Choose this option if your table size is small (<1 million rows) and you want to ingest it infrequently (few times a day)
  * *Incremental*: Ingest only the changed or inserted rows in every ingestion job run. Choose this option if your table size is large and you want to ingest in realtime mode. Requires Unique Id

    * *Unique key (Required)*

    *To Know more about Ingestion Modes, refer* [*here*](/product/ingesting-your-data/pipelines/databases/features/ingestion-modes.md)
* *Automatic Schema (Required)*:&#x20;
  * Yes: Schema is automatically discovered by Sprinkle (Recommended)
  * *No*: Hive Schema to be provided \
    Format for Hive schema is : Col1 datatype, Col2 datatype,Col3 datatype\
    Datatype should be warehouse specific.
* *Date Type: Ingestion runs from this start date/days. If Incremental, then only first run pulls from this date, further runs only pulls changes/new rows.*&#x20;
  * *Start Date*: Provide in the Format:YYYY-MM-DD
  * *No of days*
* *Destination Schema* (Required) : Data warehouse schema where the table will be ingested into
* *Destination Table name* (Required) : It is the table name to be created in the warehouse. If not given, sprinkle will create like ds\_\<Pipelinename>\_\<tablename>
* *Destination Create Table Clause*: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. [Learn more](/product/ingesting-your-data/pipelines/databases/features/destination-create-table-clause.md) on how to use this field.
* Create

### STEP-4: Run and schedule Ingestion

In the **Ingestion Jobs** ta&#x62;**:**

* Trigger the Job, using Run button
* To schedule, enable Auto-Run. Change the frequency if needed&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.sprinkledata.com/product/ingesting-your-data/pipelines/databases/azure-cosmos-db.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
