Guide to create EKS Cluster and connect to Sprinkle

Sprinkle requires an EKS cluster for data ingestion and processing. This is required to process all your data locally within your AWS VPC.

This document describes the process of creating the EKS Kubernetes Cluster and configuring the same in the Sprinkle. You can also update the existing EKS Cluster to create a new node group and sprinkle will launch pods only in specific node groups. You can refer to AWS documentation to learn more about Amazon EKS. Here we are assuming that you have already created a VPC in AWS. You can refer to AWS User Guide on VPC Setup and how VPC works.

Follow the below steps to create and configure the EKS cluster.

STEP-1: Create EKS Cluster

  1. Open the Amazon EKS console and make sure that the Region selected in the top right of your console is the Region same as your data center. If not, select the drop-down next to the Region name and select your appropriate Data Center region.

  2. Select Create cluster. On the Configure cluster page enter a name for your cluster, such as sprinkle-cluster, and select an existing Cluster Service Role. You can create a new Cluster Service Role as well using AWS Document.

  3. Keep the Kubernetes Version and the remaining settings at their default values and select Next.

  4. Networking

    • Select your VPC and all the Private Subnets

    • Security group: Select the default security group of the VPC

    • Cluster Endpoint Access: Public and Private

    • Advanced Settings: Whitelist the Sprinkle IPs, provide CIDR block as

  5. On the Configure logging page, select Next.

  6. On the Review and create page, select Create. To the right of the cluster's name, the cluster status is shown as Creating for several minutes until the cluster provisioning process completes. Don't continue to the next step until the status is Active.

STEP-2: Create EKS Node Group

Create an IAM role for Nodegroup

  • Open the IAM console

  • Choose Roles -> Create Role

  • Select EC2 as Service and EC2 as a use case.

  • Choose Next: Permissions

  • Select the following Policies: AmazonEKSWorkerNodePolicy, AmazonEC2ContainerRegistryReadOnly, AmazonEKS_CNI_Policy

  • Choose Next: Tags, provide tags in key- value pair if there is any.

  • Choose Next: Review, provide Role Name and Role Description.

  • Choose Create Role

Create EC2 Managed Node Group

  • Open the Amazon EKS console at

  • Choose the name of the cluster that you created in the previous section, such as sprinkle-cluster.

  • Select the Configuration tab.

  • On the Configuration tab, select the Compute tab, and then choose Add Node Group.

  • On the Configure node group page, fill out the parameters accordingly, accept the remaining default values, and then choose Next. Provide some unique Name for the node group and select the IAM role created in the previous step.

  • On the Set compute and scaling configuration page, accept all the default values such as

    • AMI Type is Amazon Linux 2 (AL2_x86_64)

    • Capacity Type is Spot

    • Instance Types is t3a.medium (Minimum required CPU and Memory)

    • Disk Size is 20 GB

    • Node group scaling min, max and desired size as 2 (Recommended).

  • Networking

    • Option A: Public subnets: the subnets must auto-assign public IP addresses

    • Option B: Private subnets: Requires NAT to be configured and corresponding entry in the routing tables of the private subnets. (Option B is required if IP whitelisting is required in datasources and cluster is on Autoscaler. We need fixed public IP - NAT)

  • On the Review and create page, review your managed node group configuration and choose Create.

  • After several minutes, the Status in the Node Group configuration section will change from Creating to Active. Don't continue to the next step until the status is Active.

  • Verify

    • In the left pane, select Clusters, and then in the list of Clusters, select the name of the cluster that you created.

    • On the Overview tab, you see the list of Nodes that were deployed for the cluster. You can select the name of a node to see more information about it.

    • On the Workloads tab of the cluster, you see a list of the workloads that are deployed by default to an Amazon EKS cluster. You can select the name of a workload to see more information about it.

    • On the Add-ons tab in the Configurations tab, you can see the status of the VPC-CNI. It should be active to make sure that cluster and nodegroup status will be healthy.

STEP-3: Generate user token

Sprinkle authenticates to EKS cluster using kubernetes user token. Follow below steps to generate User token:

  • Install kubectl and aws CLI

  • aws configure

  • Generate ~/.kube/config file:

aws eks --region update-kubeconfig --name <cluster_name>
  • To verify the setup, run kubectl command to fetch running nodes:

kubectl get nodes
  • Create namespace

kubectl create namespace sprinkle
  • Create Admin User In kubernetes: Create file service-account-create.yml:

apiVersion: v1
kind: ServiceAccount
  name: sprinkle-admin-user
  namespace: sprinkle
kubectl apply -f service-account-create.yml
  • Create ClusterRoleBinding: create a file role-binding.yml:

kind: ClusterRoleBinding
  name: sprinkle-admin-user
  kind: ClusterRole
  name: cluster-admin
- kind: ServiceAccount
  name: sprinkle-admin-user
  namespace: sprinkle
kubectl apply -f role-binding.yml
  • To create a long-lived API token for a ServiceAccount, you create a new secret file sprinkle-admin-secret.yml with a special annotation,

apiVersion: v1
kind: Secret
  name: sprinkle-admin-secret
  namespace: sprinkle
  annotations: sprinkle-admin-user
kubectl apply -f sprinkle-admin-secret.yml
  • User token Token will be printed by this command, note down the generated token:

kubectl describe secrets/sprinkle-admin-secret -n sprinkle

STEP-4: Configure EKS connection

  • Log into Sprinkle application

  • Navigate to Admin -> Drivers -> Create Compute

  • Select EKS

  • Provide all the mandatory details

    • Distinct Name: Any name to identify the connection

    • Cluster Url: Provide the url of the EKS created in the format https://<ENDPOINT>

    • Cluster CA Certificate: Provide Cluster CA certificate of the EKS cluster

    • Is Certificate Encoded: No

    • User Token: Paste the User Token generated in STEP-5 above

    • Deploy namespace: sprinkle

    • Notebook: (If you want to use a Notebook for transform, make Notebook enable.) Yes or No

    • Notebook Docker Container url : Url of the Jupyter Notebook docker container, available on dockerhub. See available images here

    • Notebook Idle Timeout : Time in minutes

    • Advance Settings : Yes

    • Node group labels : Key value pair separated by comma, ex:- key1: val1, key2: val2, key3: val3. If label is configured then all pods will be launched only on the node group having the same label.

    • CPU and VM size for Ingestion : No of CPU and size of each VM.

    • CPU and VM size for Notebook : No of CPU and size of each VM. If more than one entry then separate them by comma.

    • Supported Kernels : Enter name of supported kernels. For more than one kernel, separate them by comma

  • Test Connection

  • Create

(Optional) Enable Cluster Autoscaler

If Autoscaler is enabled, the nodes scale up and down based on the load. Here is the detailed documentation to enable Cluster Autoscaler.

In the above Step 2 make sure to set the min, max, and desired size of the nodes. Set Max and Desired Size to the same value. So that during heavy load all the nodes will be used.

(Optional) Sending logs to Cloud watch

To send logs from your containers to Amazon CloudWatch Logs follow the below steps. Refer here for more details.

Log Setup

To send logs from your containers to Amazon CloudWatch Logs, you can use Fluent Bit or Fluentd. For more information, see Fluent Bit and Fluentd. You can refer to the AWS documentation.

  1. Add CloudWatchLogsFullAccess policy to the nodegroup IAM role.

  2. If you don't already have a namespace called amazon-cloudwatch, create one by entering the following command: kubectl apply -f

  3. Run the following command to create a ConfigMap named cluster-info with the cluster name and the Region to send logs to. Replace cluster-name and cluster-region with your cluster's name and Region. ClusterName=cluster-name RegionName=cluster-region FluentBitHttpPort='2020' FluentBitReadFromHead='Off' [[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On' [[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On' kubectl create configmap fluent-bit-cluster-info${ClusterName} --from-literal=http.server=${FluentBitHttpServer} --from-literal=http.port=${FluentBitHttpPort} --from-literal=read.head=${FluentBitReadFromHead} --from-literal=read.tail=${FluentBitReadFromTail} --from-literal=logs.region=${RegionName} -n amazon-cloudwatch

  4. Download and deploy the Fluent Bit daemonset to the cluster by running one of the following commands. a) If you want the Fluent Bit optimized configuration, run this command. kubectl apply -f b) If you want the Fluent Bit configuration that is more similar to Fluentd, run this command. kubectl apply -f

  5. Validate the deployment by entering the following command. Each node should have one pod named fluent-bit-*. kubectl get pods -n amazon-cloudwatch

  6. Add a retention policy of 1 week (7 days). Select the log group and in the actions add retention of 1 week.

Disable Cloudwatch logs

kubectl delete -f

Checking Sprinkle Job Logs

Last updated