# AWS S3 External

S3 External is a Pipeline connection, which creates an external table in Athena or Redshift Spectrum, automatically by inferring the schema of the data. The data is not loaded into the warehouse, instead data is read from the source location itself when queries are run on the data warehouse.

## Pipeline Concepts

Before setting up the Pipeline, learn about Pipeline concepts [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

## Step by Step Guide

### STEP-1: Configure Connection

To learn about Connection, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

* Log into Sprinkle application
* Navigate to Ingest -> Connections Tab -> New Connection ->&#x20;
* Select S3 External
* Provide all the mandatory details
  * *Name*: Name to identify this connection
  * *Access Key*: Account -> My security credentials -> Access keys -> Create new access key -> Download key file -> Show access key. To know more, [click here](https://amzn.to/2CS9OcK)
  * *Secret key*: Account -> My security credentials -> Access keys -> Create new access key -> Download key file -> Show secret key. To know more: [click here](https://amzn.to/2CS9OcK)
  * *Region*: Region should be where the storage bucket was created, for example ap-south-1
  * *Bucket Name*
* Test Connection&#x20;
* Create

### STEP-2: Configure Pipeline

To learn about Pipeline, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines)

* Navigate to Ingest -> Pipeline Tab -> Add ->&#x20;
* Select S3 External
* Provide the name -> Create
* **Connection Tab**:&#x20;
  * From the drop-down, select the name of connection created in STEP-2
  * Update

### STEP-3: Create Dataset

**Datasets Tab**: To learn about Dataset, refer [here](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines). Add Dataset for each **folder** that you want to replicate, providing following details

* *File Type*: Select the File Format
  * JSON
  * CSV
    * Select Delimiter - Comma, Tab, Pipe, Dash, Other Character
  * Parquet
  * ORC
* *Compression Type* (Required): Select from none, bzip2, gzip, snappy
* *Directory Path* (Required) :Provide the full path like this: [s3a://test-sprinkle-a/s3Ingest/s3Ingest13](http://s3a/test-sprinkle-a/s3Ingest/s3Ingest13)
* Flatten Level (Required): Select  One Level or Multi Level. In one level, flattening will not be applied on complex type. They will be stored as string. In multi level, flattening will be applied in complex level till they become simple type.
* *Destination Schema* (Required) : Data warehouse schema where the table will be created
* *Destination Table name* (Required) : It is the table name to be created on the warehouse. If not given, sprinkle will create like ds\_\<Pipelinename>\_\<tablename>
* *Destination Create Table Clause*: Provide additional clauses to warehouse-create table queries such as clustering, partitioning, and more, useful for optimizing DML statements. [Learn more](https://docs.sprinkledata.com/product/ingesting-your-data/pipelines/databases/features/destination-create-table-clause) on how to use this field.
* Create

### STEP-4: Run and schedule Ingestion

In the **Ingestion Jobs** ta&#x62;**:**

* Trigger the Job, using Run button
* To schedule, enable Auto-Run. Change the frequency if needed&#x20;
