Databricks

Guide to integrate Databricks with Sprinkle

This page covers the details about integrating Databricks with Sprinkle.

When setting up Databricks connection, Sprinkle additionally requires a Cloud bucket. This guide covers the role of all the components and steps to setup.

  • Integrating Databricks: All analytical data is stored and queried from Databricks warehouse

  • Cloud Bucket: Sprinkle stores all intermediate data and report caches in this bucket

Step by Step Guide

Integrating Databricks

STEP-1: Allow Databricks to accept connection from Sprinkle

Allow inbound connection on databricks jdbc port (default is 443) from Sprinkle IPs (34.93.254.126, 34.93.106.136).

STEP-2: Configure Databricks Connection

  • Log into Sprinkle application

  • Navigate to Admin -> Warehouse -> New Warehouse Connection

  • Select Databricks

  • Provide all the mandatory details

    • Distinct Name: Name to identify this connection

    • Host : Provide IP address or Host name.

    • Port :Provide the Port number.

    • Database : Provide database name if there is any, it should be an existing database.

    • Port :Provide the Port number.

    • Host : Provide IP address or Host name.

    • Database : Provide database name if there is any, it should be an existing database.

    • Username: The id with which you login to databricks

    • Password: Personal access token. To generate, see here.

    • Storage Mount Name: Storage that will be used by Databricks. See the section for more details.

  • Test Connection

  • Create

Creating Storage Mount

Go to Databricks home page and click on the create button on the right side and select notebook. Select the cluster you want to configure with sprinkle and select python as default language.

Run this Python code

Depending on your Cloud, you can create the mount. Sprinkle currently supports Databricks in Azure and AWS clouds.

Azure blob

Refer https://docs.databricks.com/data/data-sources/azure/azure-storage.html

dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<storage key>"})

S3

Refer https://docs.databricks.com/data/data-sources/aws/amazon-s3.html

AccessKey = "<Access_Key>"
SecretKey = "<Secret_Key>"
SecretKey = SecretKey.replace("/", "%2F")
aws_bucket_name = "<Bucket_Name>"
mount_name = "<mount_name>"
dbutils.fs.mount("s3a://%s:%s@%s" % (AccessKey, SecretKey, aws_bucket_name), "/mnt/%s" % mount_name)
display(dbutils.fs.ls("/mnt/%s" % mount_name))

Note:

  1. Storage configured and Storage mount on data bricks should be on the same bucket

  2. Give a unique Storage Mount name and it should not collide with existing mounts. (If path name is /mnt/sprinkle then just mention sprinkle)

  3. Need to set this property "spark.databricks.delta.alterTable.rename.enabledOnAWS" to True in databricks.

Create a Cloud Bucket

Cloud bucket can be created depending on your Databricks Cloud. Sprinkle supports creating a bucket in AWS or Azure. Refer respective documents for creating a configuring the Cloud Bucket.

Last updated