# SFTP

## Pipeline Concepts

Before setting up the Pipeline, learn about Pipeline concepts [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

## Step by Step Guide

### STEP-1: Configure SFTP Connection

To learn about Connection, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

* Log into Sprinkle application
* Navigate to Ingest -> Connections Tab -> New Connection ->&#x20;
* Select SFTP
* Provide all the mandatory details
  * *Name*: Name to identify this connection
  * *SSH Host*: IP address or hostname of the SSH server.
  * *SSH Port*: Port of the SSH server. Default is 22.
  * SSH Login Username: SFTP username
  * Choose Authentication Mode (either one of given below) and enter the required information as follows :-
    * SSH Public Key
    * Password
* Test Connection&#x20;
* Create

### STEP-2: Configure SFTP Pipeline

To learn about Pipeline, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

* Navigate to Ingest -> Pipeline Tab -> Add ->&#x20;
* Select SFTP
* Provide the name -> Create
* **Connection Tab**:&#x20;
  * From the drop-down, select the name of connection created in STEP-2
  * Update

### STEP-3: Create Dataset

**Datasets Tab**: To learn about Dataset, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines). Add Dataset for each **directory** that you want to replicate, providing following details

* *File Type*: Select the File Format
  * JSON
  * CSV
    * Select Delimiter - Comma, Tab, Pipe, Dash, Other Character
* *Directory Path* (Required) :Provide the full path
* *Ingestion Mode* (Required) :&#x20;
  * *Complete*: Full folder is downloaded and ingested in every ingestion job run
  * *Incremental*: Ingest only the new files in every ingestion job run. Use this option if your folder is very large, and you are getting new files continuously
    * *Remove Duplicate Rows*:
      * *Unique Key:* Unique key from table, to dedup data across multiple ingestions
      * *Time Column Name*: Will be used to order data for deduping
    * *Max Job Runtime*: Give maximum time in minutes for which data should be downloaded. Ingestion job will run specified max minutes and checkpoint will be updated. Next run will continue from checkpoint.
* *Destination Schema* (Required) : Data warehouse schema where the table will be ingested into
* *Destination Table Name* (Required) : Table name suffix which will be used to create the table in the warehouse
* *Destination Create Table Clause*: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. [Learn more](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines/databases/features/destination-create-table-clause) on how to use this field.
* Create

### STEP-4: Run and schedule Ingestion

In the **Ingestion Jobs** ta&#x62;**:**

* Trigger the Job, using Run button
* To schedule, enable Auto-Run. Change the frequency if needed&#x20;
