# Databricks

This page covers the details about integrating Databricks with Sprinkle.

When setting up Databricks connection, Sprinkle additionally requires a Cloud bucket.  This guide covers the role of all the components and steps to setup.

* [Integrating Databricks](#integrating-redshift): All analytical data is stored and queried from Databricks warehouse
* [Cloud Bucket](#create-a-cloud-bucket): Sprinkle stores all intermediate data and report caches in this bucket

## Step by Step Guide

### Integrating Databricks

#### **STEP-1: Allow Databricks to accept connection from Sprinkle**

Allow inbound connection on databricks jdbc port (default is 443) from Sprinkle IPs (34.93.254.126, 34.93.106.136).

#### STEP-2: Configure Databricks Connection on <mark style="color:purple;">Sprinkle</mark>

{% tabs %}
{% tab title="Cluster" %}
To get the connection details for a Databricks [cluster](https://docs.databricks.com/en/compute/configure.html), do the following:

1. Log in to your Databricks workspace.
2. In the sidebar, click **Compute**.
3. In the list of available clusters, click the target cluster’s name.
4. On the **Configuration** tab, expand **Advanced options**.
5. Click the **JDBC/ODBC** tab.
6. Copy the connection details that you need, such as **Server Hostname**, **Port**, and **HTTP Path**.
   {% endtab %}

{% tab title="Warehouse" %}
To get the connection details for a Databricks SQL [warehouse](https://docs.databricks.com/en/compute/sql-warehouse/create.html), do the following:

1. Log in to your Databricks workspace.
2. In the sidebar, click **SQL > SQL Warehouses**.
3. In the list of available warehouses, click the target warehouse’s name.
4. On the **Connection Details** tab, copy the connection details that you need, such as **Server hostname**, **Port**, and **HTTP path**.
   {% endtab %}
   {% endtabs %}

* Log into Sprinkle application
* Navigate to Admin -> Warehouse -> New Warehouse Connection
* Select Databricks
* In the <mark style="color:purple;">**Connect Warehouse**</mark>**&#x20;form in Sprinkle,** provide all the mandatory details
  * *Distinct Name*: Name to identify this connection
  * Host: Provide the IP address or hostname of your Databricks instance.
  * Port: Provide the port number for your Databricks instance.
  * *Database***:** Provide the name of the specific database you want to connect to within Databricks, if applicable. This should be an existing database
  * HTTP Path: Provide the HTTP path component of your Databricks cluster connection URL. This path identifies the specific Databricks instance you're trying to access.&#x20;
  * *Username*: The username (ID) you use to log in to data bricks.&#x20;
  * *Password*: Personal access token. To generate, see [here](https://docs.databricks.com/dev-tools/api/latest/authentication.html#generate-a-personal-access-token).
  * *Storage Mount Name*: Storage that will be used by Databricks. See [the section](#creating-storage-mount) for more details.
* Test Connection&#x20;
* Create

### Creating Storage Mount

Go to Databricks home page and click on the create button on the right side and select notebook. Select the cluster you want to configure with sprinkle and select python as default language.

Run this Python code

Depending on your Cloud, you can create the mount. Sprinkle currently supports Databricks in Azure and AWS clouds.

#### Azure blob

Refer <https://docs.databricks.com/data/data-sources/azure/azure-storage.html>

```
dbutils.fs.mount(
  source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"fs.azure.account.key.<storage-account-name>.blob.core.windows.net":"<storage key>"})

```

#### S3

Refer <https://docs.databricks.com/data/data-sources/aws/amazon-s3.html>

```
AccessKey = "<Access_Key>"
SecretKey = "<Secret_Key>"
SecretKey = SecretKey.replace("/", "%2F")
aws_bucket_name = "<Bucket_Name>"
mount_name = "<mount_name>"
dbutils.fs.mount("s3a://%s:%s@%s" % (AccessKey, SecretKey, aws_bucket_name), "/mnt/%s" % mount_name)
display(dbutils.fs.ls("/mnt/%s" % mount_name))
```

**Note:**

1. Storage configured and Storage mount on data bricks should be on the same bucket
2. Give a unique Storage Mount name and it should not collide with existing mounts. (If path name is **/mnt/sprinkle** then just mention **sprinkle**)
3. Need to set this property "*spark.databricks.delta.alterTable.rename.enabledOnAWS*" to **True** in databricks.

### Create a Cloud Bucket

Cloud bucket can be created depending on your Databricks Cloud. Sprinkle supports creating a bucket in AWS or Azure. Refer respective documents for creating a configuring the Cloud Bucket.

* [Create S3 Bucket](https://docs.sprinkledata.com/product/integrating-your-data/aws-redshift#step-1-create-a-s3-bucket)
* [Create Azure Storage Container](https://docs.sprinkledata.com/product/integrating-your-data/azure-synapse#create-azure-storage-container)
