AWS Athena

Guide to integrate your Athena with Sprinkle

This page covers the details about integrating Athena with Sprinkle.

When setting up Athena connection, Sprinkle additionally requires a S3 bucket. This guide covers the role of all the components and steps to setup.

  • Integrating Athena: All analytical data is stored and queried from Athena warehouse

  • Create S3 Bucket: Sprinkle stores all intermediate data and report caches in this bucket

Step by Step Guides

Integrating Athena

STEP-1: Configure Athena output location

Configure in the AWS Athena Portal

STEP-2: Configure Athena Connection

  • Log into Sprinkle application

  • Navigate to Admin -> Warehouse

  • Select Athena

  • Provide all the mandatory details

    • Distinct Name: Name to identify this connection

    • Host : Provide IP address or Host name.

    • Port : Provide the Port number

    • Catalog : Provide Catalog name if there is any, it should be an existing catalog.

    • API Key

    • Secret Key

    • Advance Settings : Below fields are shown if it is marked as Yes

    • Work Group: Work group in which queries will be executed. See more about workgroups

    • S3 Output Location: location configured in STEP-1

    • Schema

  • Test Connection

  • Create

Create S3 Bucket

S3 bucket is required to store intermediate data and report outputs. Refer here for creating an S3 bucket and configuring it in Sprinkle

Athena Output Purging: Setup

  1. Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/.

  2. In the Buckets list, choose the name of the bucket that you want to create a lifecycle rule for.

  3. Choose the Management tab, and select Create lifecycle rule.

  4. In the Lifecycle rule name, enter a name for your rule. The name must be unique within the bucket.

  5. Choose the scope of the lifecycle rule as Limit the scope of this rule using one or more filters. To limit the scope by prefix, in Prefix, enter the prefix. (Folder name which we specified in Athena Query result location)

  6. Under Lifecycle rule actions, choose the following actions that this lifecycle rule will perform:

    1. Expire current versions of objects

    2. Permanently delete previous versions of objects

  7. To expire current versions of objects, under Expire current versions of objects, in the Number of days after object creation, enter the number of days as 1.

  8. To permanently delete previous versions of objects, under Permanently delete previous versions of objects, in the Number of days after objects become previous versions, enter the number of days as 1.

  9. Verify the rule once again and choose Create rule.

  10. The created rule will appear in the Life cycle rules.

Database Level Access in Athena: Setup

The following procedure can help to create Athena dB and provide access to the particular dB in Sprinkle. It will also help to share Athena across different organizations without sharing the same credentials.

Steps to be followed

  1. Create a new database in Athena

  2. Create IAM Policy

  3. Create IAM User

  4. Create Access key

  5. Update Sprinkle driver

Create Athena Database

  1. Log in to the AWS console and go to Athena.

  2. Run the create database command in the query editor by updating dB name. CREATE DATABASE <DB_NAME>

Create IAM Policy

  1. Log in to the AWS console, go to IAM and select Policy from the left pane.

  2. Click on Creare policy and go to the JSON tab.

  3. Paste the following policy in the editor with updating region, database name, account number and S3 bucket name.

Access Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:GetPartition",
                "glue:CreateTable",
                "glue:GetTables",
                "glue:GetPartitions",
                "glue:CreateDatabase",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:CreatePartition",
                "glue:DeletePartition",
                "glue:UpdatePartition",
                "glue:GetTable",
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults",
                "athena:GetQueryResultsStream"
            ],
            "Resource": [
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:catalog",
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:database/<DB_NAME>",
              "arn:aws:glue:<REGION>:<ACCOUNT_NO>:table/<DB_NAME>/*",
              "arn:aws:athena:<REGION>:<ACCOUNT_NO>:workgroup/primary"
            ]
        },
         {
             "Effect": "Allow",
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:ListBucketMultipartUploads",
                 "s3:ListBucket",
                 "s3:DeleteObject",
                 "s3:GetBucketLocation",
                 "s3:ListMultipartUploadParts"
             ],
             "Resource": [
                 "arn:aws:s3:::<BUCKET_NAME>/*",
                 "arn:aws:s3:::<BUCKET_NAME>"
             ]
         }
    ]
}

Create IAM user

  1. Log in to the AWS console, go to IAM and select User from the left pane.

  2. Click on Add User, provide the User name and select access type as Programmatic Access.

  3. Attach the policy created for Athena access from Attach existing policy directly.

Create Access key

  1. Go to the IAM users and select the newly created user.

  2. Goto Security Credentials tab and click on create access key.

  3. This will create the access key. Download the CSV file and click on show secret to copy the secret.

Last updated