Skip to content

Prerequisites

This page lists every input supplied by the user that the four-step Phase 1 platform flow consumes. Walk it once before starting setup. The pages for each step (Bundle deploy, Secrets bootstrap, Platform bootstrap job) reference the values captured here verbatim.

The redesign moves the framework off terraform apply as the magic step. Cloud and source-system infrastructure (VPC, EKS, S3, ECR, the connector source systems themselves) is no longer provisioned by this repository. Users bring their own cloud backbone and stand up source systems either manually or via the optional runtimes for each connector under src/connectors/<source>/runtime/. The four-step Phase 1 only touches Databricks objects and the AWS resources supplied by the user that Unity Catalog needs to read scanner artifacts.

Accounts to create

# Account What to capture
1 AWS account with sufficient privilege to create the VPC, EKS cluster, S3 bucket, ECR repository, and IAM roles listed under "AWS backbone" below. AWS account ID, plus IAM credentials with the requisite permissions.
2 Databricks workspace on AWS with a Unity Catalog metastore attached. Workspace creation is a generic Databricks setup step (manual via the account console or automated via the official aws-workspace-basic Terraform module). The user picks a path. After the workspace is up, create a personal access token in User Settings, Developer, Access Tokens and note the SQL warehouse ID for the platform bootstrap job (Admin Settings, SQL Warehouses). DATABRICKS_HOST (workspace URL), DATABRICKS_TOKEN (PAT), warehouse_id.
3 Accounts for each source for the connectors the user plans to install. Each connector page under Install connectors lists the specific account, URL, and credential it requires (e.g. GitHub organization plus PAT for the github connector, ServiceNow tenant plus service account credentials for the servicenow connector). Tokens and URLs for each connector (recorded in Secrets bootstrap when running load-secrets.sh for each connector).

Creating the Databricks workspace

Workspace provisioning is the same across every Databricks deployment and adds no value to reproduce here. Pick whichever path suits you:

  • Manual (account console): follow the Databricks guide Create a classic workspace. ~1 hour end-to-end including the cross-account IAM role and root S3 bucket.
  • Automated (Terraform): copy the official aws-workspace-basic module from databricks/terraform-databricks-examples. Requires an account admin OAuth M2M service principal.

Either path produces the workspace URL plus PAT in the right-hand column.

AWS backbone the user brings

The platform DAB itself does not provision AWS infrastructure. The user stands up the following resources out-of-band before running Bundle deploy:

Resource Why the platform needs it Captured as
VPC with public and private subnets in one AWS region. Hosts the EKS cluster the optional connector runtimes deploy into. Also lets Databricks reach source systems running on EKS over private networking when the workspace is configured for VPC peering. User records VPC ID and subnet IDs for use by the connector runtimes only. The platform DAB does not read these values.
EKS cluster with a managed node group (~3 × t3.medium or larger). Hosts the optional connector runtimes (SonarQube Helm release, Semgrep CronJob, ZAP daemon, Juice Shop demo target). Cluster name and region, plus OIDC provider ARN for runtimes that use IRSA.
S3 artifact bucket in the same region as EKS. Receives scan artifacts written by the Semgrep CronJob and the ZAP CI step. Exposed into Unity Catalog as an external location so the connector ingest jobs can read them as Delta-readable paths. ARTIFACT_BUCKET env var (consumed by src/platform/scripts/bootstrap.sh).
ECR repository in the same region. Hosts container images pushed from Juice Shop CI in the cross-scanner end-to-end demo. Required only if the user runs the demo path of the github runtime. ECR registry URI (consumed by the github runtime, not the platform DAB).
IAM role for IRSA (semgrep CronJob), assumable by the semgrep Kubernetes service account, granted s3:PutObject, s3:GetObject, s3:ListBucket on ARTIFACT_BUCKET. Lets the Semgrep CronJob write scan results to the artifact bucket without long-lived AWS keys. Required only if the user runs the semgrep runtime. Role ARN (consumed by the semgrep runtime, not the platform DAB).
IAM role for the UC external location, assumable by the Databricks Unity Catalog managed storage principal, granted s3:GetObject, s3:ListBucket, s3:PutObject on ARTIFACT_BUCKET. The trust policy must follow the Databricks UC storage credential trust policy. Lets Unity Catalog read scanner artifacts from the bucket as a UC external location. The platform bootstrap script creates the storage credential and external location pointing at this role. EXTERNAL_LOCATION_ROLE_ARN env var (consumed by src/platform/scripts/bootstrap.sh).

Provision these via your existing IaC tooling or by hand. The redesigned platform DAB has no opinions about how. It only reads the bucket name and IAM role ARN. (The repository previously shipped an infra/terraform/ module that did this provisioning. That module was removed in the Databricks-centric redesign and users are expected to bring their own backbone.)

Local tooling

Install on your workstation. All steps below assume a Unix style shell (macOS, Linux, WSL, Git Bash on Windows):

Tool Minimum version macOS Homebrew Windows (winget)
Databricks CLI 0.240 brew install databricks curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh \| sh
AWS CLI v2 2.15 brew install awscli winget install Amazon.AWSCLI
Terraform 1.7 (only if running connector runtimes) brew install terraform winget install HashiCorp.Terraform
kubectl 1.30 (only for runtimes that deploy to EKS) brew install kubectl winget install Kubernetes.kubectl
Helm 3.14 (only for runtimes that install Helm charts) brew install helm winget install Helm.Helm
git 2.40 pre-installed winget install Git.Git
jq 1.6 brew install jq winget install stedolan.jq

The Databricks CLI and AWS CLI are required for every user. Terraform, kubectl, and Helm are required only if the user opts into the connector runtimes. Connectors that consume an existing source system (your own GitHub org, ServiceNow tenant, SonarQube instance) skip those runtimes entirely.

Credential file

The Databricks CLI reads DATABRICKS_HOST and DATABRICKS_TOKEN from the environment, an ~/.databrickscfg profile, or a .env file the user sources before each session. Pick whichever fits your secrets management. The repository ships no terraform.tfvars and no .env.example. The bundle reads everything via DAB variables passed on the command line at deploy time or via environment variables resolved through ${env.DATABRICKS_HOST} and similar.

A minimal session bootstrap:

export DATABRICKS_HOST="https://<workspace>.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi..."
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_REGION="us-east-1"

# Captured from the AWS backbone above:
export ARTIFACT_BUCKET="my-appsec-mvp-artifacts"
export EXTERNAL_LOCATION_ROLE_ARN="arn:aws:iam::123456789012:role/uc-external-location"

# Captured from the Databricks workspace:
export CATALOG="appsec_dev"
export WAREHOUSE_ID="abcd1234..."

databricks bundle deploy reads DATABRICKS_HOST and DATABRICKS_TOKEN. src/platform/scripts/bootstrap.sh reads EXTERNAL_LOCATION_ROLE_ARN, ARTIFACT_BUCKET, and CATALOG. Secret loaders for each connector read whichever env vars their connector documents.

Next

Proceed to Bundle deploy to deploy the DAB into the workspace, then Secrets bootstrap to load the cross-cutting platform secrets, then Platform bootstrap job to apply the silver table DDL.