Google GKE

Guide to create GKE cluster and connect to Sprinkle

Sprinkle requires GKE cluster for data ingestion and processing. This is required to process all your data locally within your Google private network.

Follow the below steps to create and configure GKE cluster:

STEP-1: Create GKE Cluster

Create a GKE Standard cluster with all default settings with following configuration:

Networking

  • Select Public cluster

  • Enable control plane authorized networks: Provide any name like Sprinkle network and put the CIDR range (to whitelist the Sprinkle IPs), as 34.93.254.126/32

  • Default Node Pool: 2 nodes

  • Node Type: n1-standard-1

STEP-2: Generate user token

Sprinkle authenticates to GKE cluster using kubernetes user token. Follow below steps to generate User token:

  • Install kubectl and gcloud CLI

  • Generate ~/.kube/config file:

gcloud container clusters get-credentials  --project <project_id>  --zone <zone> <cluster_name>
  • To verify the setup, run kubectl command to fetch running nodes:

kubectl get nodes
  • Create namespace

kubectl create namespace sprinkle
  • Create Admin User In kubernetes: Create file service-account-create.yml:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sprinkle-admin-user
  namespace: kube-system
kubectl apply -f service-account-create.yml
  • Create ClusterRoleBinding: create a file role-binding.yml:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: sprinkle-admin-user
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: sprinkle-admin-user
  namespace: kube-system
kubectl apply -f role-binding.yml
  • User token Token will be printed by this command, note down the generated token:

kubectl -n sprinkle describe secret $(kubectl -n sprinkle get secret | grep sprinkle-admin-user | awk '{print $1}')

STEP-3: Configure GKE connection

  • Log into Sprinkle application

  • Navigate to Admin -> Drivers -> Create Compute

  • Select GKE

  • Provide all the mandatory details

    • Distinct Name: Any name to identify the connection

    • Cluster Url: Provide the url of the GKE created in the format https://<ENDPOINT>

    • Cluster CA Certificate: Provide Cluster CA certificate of the GKE cluster

    • Is Certificate Encoded: No

    • User Token: Paste the User Token generated in STEP-2 above

    • Deploy namespace: sprinkle

    • Supported Kernels : Enter name of supported kernels. For more than one kernel, separate them by comma

    • CPU and VM size for Notebook : No of CPU and size of each VM. If more than one entry then separate them by comma.

    • CPU and VM size for Ingestion : No of CPU and size of each VM.

    • Node group labels : Key value pair separated by comma, ex:- key1: val1, key2: val2, key3: val3. If label is configured then all pods will be launched only on the node group having the same label.

    • Advance Settings : Yes

    • Notebook Idle Timeout : Time in minutes

    • Notebook Docker Container url : Url of the Jupyter Notebook docker container, available on dockerhub. See available images here

    • Notebook: (If you want to use a Notebook for transform, make Notebook enable.) Yes or No

  • Test Connection

  • Create

Checking Sprinkle Job Logs

  • Open google cloud platform -> https://console.cloud.google.com/

  • Search for "Logging".

  • On the left panel click on Logs Explorer.

  • From the Query panel

    • Click on Resource -> Select Kubernetes Container -> Choose cluster name -> Choose Namespace

    • In the Search container name box, put the sprinkle job id. (To find the sprinkle job id, click on "Show details" link in the sprinkle UI for that job)

  • Run Query.

  • Checks for logs.

Last updated