AWS Athena

Guide to integrate your Athena with Sprinkle

This page covers the details about integrating Athena with Sprinkle.

When setting up Athena connection, Sprinkle additionally requires a S3 bucket. This guide covers the role of all the components and steps to setup.

  • Integrating Athena: All analytical data is stored and queried from Athena warehouse

  • Create S3 Bucket: Sprinkle stores all intermediate data and report caches in this bucket

Step by Step Guides

Integrating Athena

STEP-1: Configure Athena output location

Create the S3 bucket in the same region and then configure in the AWS Athena Portal

STEP-2: Create Access and Secret key

  • Create an user, say "sprinkle-athena"

  • Set permissions

    • Select -> Attach policy directly

    • Select AmazonAthenaFullAccess and AmazonS3FullAccess

  • Create access key

    • In Security credential, create access ky

    • Select usecase: Other

    • Generate and copy access and secret key

STEP-3: Configure Athena Connection on Sprinkle

In the Connect Warehouse form

  • Provide all the mandatory details

    • Distinct Name: Name to identify this connection

    • Region: Select the AWS region where your Athena instance was created (e.g., Asia Pacific

    • Port: Provide the port number. Default is set at 443.

    • API Key (created in STEP-2) : You can create or view this in the IAM console under "Security credentials".

    • Secret Key (created in STEP-2)

    • S3 Output Location: (location configured in STEP-1) This is the path to the Amazon S3 location where you want to store your query results. The path should be prefixed with s3:// and match the one configured in your Athena setup.

    • Schema: The name of the database schema to use by default when a schema is not explicitly specified in your query.

    • Catalog (Optional): Provide the Catalog name if your Athena setup uses one.

    • Workgroup (Optional): Specify the workgroup in which you want to execute your queries.

  • Test Connection: Tests the connection endpoints provided.

  • Create: Saves the endpoints provided and creates the connection in Sprinkle.

Create S3 Bucket

S3 bucket is required to store intermediate data and report outputs. Refer here for creating an S3 bucket and configuring it in Sprinkle

(OPTIONAL) Athena Output Purging: Setup

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. In the Buckets list, choose the name of the bucket that you want to create a lifecycle rule for.

  3. Choose the Management tab, and select Create lifecycle rule.

  4. In the Lifecycle rule name, enter a name for your rule. The name must be unique within the bucket.

  5. Choose the scope of the lifecycle rule as Limit the scope of this rule using one or more filters. To limit the scope by prefix, in Prefix, enter the prefix. (Folder name which we specified in Athena Query result location)

  6. Under Lifecycle rule actions, choose the following actions that this lifecycle rule will perform:

    1. Expire current versions of objects

    2. Permanently delete previous versions of objects

  7. To expire current versions of objects, under Expire current versions of objects, in the Number of days after object creation, enter the number of days as 1.

  8. To permanently delete previous versions of objects, under Permanently delete previous versions of objects, in the Number of days after objects become previous versions, enter the number of days as 1.

  9. Verify the rule once again and choose Create rule.

  10. The created rule will appear in the Life cycle rules.

(OPTIONAL) Database Level Access in Athena: Setup

The following procedure can help to create Athena dB and provide access to the particular dB in Sprinkle. It will also help to share Athena across different organizations without sharing the same credentials.

Steps to be followed

  1. Create a new database in Athena

  2. Create IAM Policy

  3. Create IAM User

  4. Create Access key

  5. Update Sprinkle driver

Create Athena Database

  1. Log in to the AWS console and go to Athena.

  2. Run the create database command in the query editor by updating dB name. CREATE DATABASE <DB_NAME>

Create IAM Policy

  1. Log in to the AWS console, go to IAM and select Policy from the left pane.

  2. Click on Creare policy and go to the JSON tab.

  3. Paste the following policy in the editor with updating <REGION>, <DB_NAME>, <ACCOUNT_NO> and S3 <BUCKET_NAME>.

Access Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartition",
                "glue:CreateTable",
                "glue:GetTables",
                "glue:GetPartitions",
                "glue:CreateDatabase",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:CreatePartition",
                "glue:DeletePartition",
                "glue:UpdatePartition",
                "glue:GetTable",
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:GetQueryResultsStream"
            ],
            "Resource": [
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:catalog",
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:database/<DB_NAME>",
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:table/<DB_NAME>/*",
              "arn:aws:athena:<REGION>:<ACCOUNT_NO>:workgroup/primary"
            ]
        },
         {
             "Effect": "Allow",
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:ListBucketMultipartUploads",
                 "s3:ListBucket",
                 "s3:DeleteObject",
                 "s3:GetBucketLocation",
                 "s3:ListMultipartUploadParts"
             ],
             "Resource": [
                 "arn:aws:s3:::<BUCKET_NAME>/*",
                 "arn:aws:s3:::<BUCKET_NAME>"
             ]
         }
    ]
}

Create IAM user

  1. Log in to the AWS console, go to IAM and select User from the left pane.

  2. Click on Add User, provide the User name and select access type as Programmatic Access.

  3. Attach the policy created for Athena access from Attach existing policy directly.

Create Access key

  1. Go to the IAM users and select the newly created user.

  2. Goto Security Credentials tab and click on create access key.

  3. This will create the access key. Download the CSV file and click on show secret to copy the secret.

Last updated