Guide to integrate your files using FTP

Datasource Concepts

Before setting up the datasource, learn about datasource concepts here

Step by Step Guide

STEP-1: Configure FTP/FTPs Connection

To learn about Connection, refer here

  • Log into Sprinkle application

  • Navigate to Datasources -> Connections Tab -> New Connection ->

  • Select FTP or FTPs

  • Provide all the mandatory details

    • Name: Name to identify this connection

    • Host: FTP Hostname

    • Port: FTP port. Default is 22.

    • User: FTP username

    • Password

  • Test Connection

  • Create

STEP-2: Configure FTP/FTPs datasource

To learn about datasource, refer here

  • Navigate to Datasources -> Datasources Tab -> Add ->

  • Select FTP or FTPs

  • Provide the name -> Create

  • Connection Tab:

    • From the drop-down, select the name of connection created in STEP-2

    • Update

STEP-3: Create Dataset

Datasets Tab: To learn about Dataset, refer here. Add Dataset for each directory that you want to replicate, providing following details

  • Table Name (Required) : Table name suffix which will be used to create the table in the warehouse

  • Directory Path (Required) :Provide the full path

  • Ingestion Mode (Required) :

    • Complete: Full folder is downloaded and ingested in every ingestion job run

    • Incremental: Ingest only the new files in every ingestion job run. Use this option if your folder is very large, and you are getting new files continuously

      • Remove Duplicate Rows:

        • Unique Key: Unique key from table, to dedup data across multiple ingestions

        • Time Column Name: Will be used to order data for deduping

      • Max Job Runtime: Give maximum time in minutes for which data should be downloaded. Ingestion job will run specified max minutes and checkpoint will be updated. Next run will continue from checkpoint.

  • File Type: Select the File Format

    • JSON

    • CSV

      • Select Delimiter - Comma, Tab, Pipe, Dash, Other Character

    • Parquet

    • ORC

  • Destination Schema (Required) : Data warehouse schema where the table will be ingested into

  • Warehouse Table name (Optional) : It is optional field. If not given, sprinkle will create like ds_<datasourcename>_<tablename>

  • Destination Create Table Clause: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. Learn more on how to use this field.

  • Create

STEP-4: Run and schedule Ingestion

In the Ingestion Jobs tab:

  • Trigger the Job, using Run button

  • To schedule, enable Auto-Run. Change the frequency if needed

Last updated