Secrets bootstrap¶

Create the cross-cutting Databricks objects DAB has no native resource type for, then optionally load secret values for each connector. This is step 3 of the four-step Phase 1 platform flow: Prerequisites, then Bundle deploy, then Secrets bootstrap, then Platform bootstrap job.

The split between platform level secret loading and connector level secret loading is deliberate: the platform script bootstraps shared infrastructure once, and each connector ships its own loader so users only configure connectors they actually use.

Two scripts, two responsibilities¶

Script	Scope	Run	Idempotency
`src/platform/scripts/bootstrap.sh`	Cross-cutting platform objects. Creates the secret scope `mvp-connectors`, a UC storage credential, and a UC external location pointing at `s3://${ARTIFACT_BUCKET}/`. No knowledge of any specific connector.	Once per workspace, after Bundle deploy.	Yes. Re-runs are safe. Existing objects are skipped.
`src/connectors/<source>/scripts/load-secrets.sh`	Secret values for each connector. Each connector script writes only the secret keys that connector reads.	Once per connector, before the first run of that connector.	Yes. Re-runs update existing values.

The platform script does not load any secret values for individual connectors. The user runs whichever connector loaders apply to their deployment.

Platform bootstrap script¶

Source: src/platform/scripts/bootstrap.sh.

Inputs (env vars)¶

Env var	Purpose
`EXTERNAL_LOCATION_ROLE_ARN`	IAM role ARN for the UC external location (provisioned by the user per Prerequisites, AWS backbone).
`ARTIFACT_BUCKET`	S3 bucket name (no `s3://` prefix).
`CATALOG`	Unity Catalog name (e.g. `appsec_dev`). Used to name the UC objects so they are scoped per target.

The script set -us on missing variables and fails fast.

What it creates¶

Databricks object	Name	Why
Secret scope	`mvp-connectors`	Container for every connector secret. All connector code reads from this scope.
Storage credential	`${CATALOG}-artifacts` (e.g. `appsec_dev-artifacts`)	Unity Catalog wrapper around the UC IAM role provided by the user.
External location	`${CATALOG}_artifacts`	UC pointer to `s3://${ARTIFACT_BUCKET}/` using the storage credential. The semgrep and owasp_zap connectors create external volumes inside this location.

Run¶

# From repo root, with env vars from Prerequisites already exported:
export EXTERNAL_LOCATION_ROLE_ARN="arn:aws:iam::123456789012:role/uc-external-location"
export ARTIFACT_BUCKET="my-appsec-mvp-artifacts"
export CATALOG="appsec_dev"

bash src/platform/scripts/bootstrap.sh

Expected output:

==> Creating secret scope: mvp-connectors
==> Creating storage credential: appsec_dev-artifacts
==> Creating external location: appsec_dev_artifacts
OK: platform bootstrap complete.
Next: populate connector secrets via src/connectors/<source>/scripts/load-secrets.sh
Then: databricks bundle run platform-bootstrap   (creates silver tables)

Re-running is safe. RESOURCE_ALREADY_EXISTS errors on the create calls are swallowed by the grep -v filters in the script.

Verify¶

databricks secrets list-scopes | grep mvp-connectors
databricks unity-catalog storage-credentials get "${CATALOG}-artifacts"
databricks unity-catalog external-locations get "${CATALOG}_artifacts"

Secret loaders for each connector¶

Each connector under src/connectors/<source>/scripts/ ships its own load-secrets.sh. The full set:

Connector	Script	Env vars consumed	Secret keys written
github	`src/connectors/github/scripts/load-secrets.sh`	`GITHUB_PAT`, `GITHUB_ORG`	`github_token`, `github_org`
gitlab	`src/connectors/gitlab/scripts/load-secrets.sh`	`GITLAB_BASE_URL`, `GITLAB_TOKEN`	`gitlab_base_url`, `gitlab_token`
servicenow	`src/connectors/servicenow/scripts/load-secrets.sh`	`SERVICENOW_URL`, `SERVICENOW_USERNAME`, `SERVICENOW_PASSWORD`	`servicenow_url`, `servicenow_username`, `servicenow_password`
sonarqube	`src/connectors/sonarqube/scripts/load-secrets.sh`	`SONARQUBE_URL`, `SONARQUBE_TOKEN`	`sonarqube_url`, `sonarqube_token`
semgrep	`src/connectors/semgrep/scripts/load-secrets.sh`	`ARTIFACT_BUCKET`, `SEMGREP_PREFIX` (default `semgrep/`)	`semgrep_artifact_bucket`, `semgrep_artifact_prefix`
dependency_track	`src/connectors/dependency_track/scripts/load-secrets.sh`	`DT_APIKEY`	`dependency_track_api_key`
trufflehog	`src/connectors/trufflehog/scripts/load-secrets.sh`	`TRUFFLEHOG_ARTIFACT_BUCKET`, optionally `AWS_ACCESS_KEY_ID` + `AWS_SECRET_ACCESS_KEY`	`trufflehog_artifact_bucket`, optionally `trufflehog_aws_credentials` (JSON blob `{access_key_id, secret_access_key}`)
owasp_zap	`src/connectors/owasp_zap/scripts/load-secrets.sh`	`ZAP_URL`, `ZAP_API_KEY`	`zap_url`, `zap_api_key`
aws_waf	`src/connectors/aws_waf/scripts/load-secrets.sh`	`WAF_LOG_BUCKET`, `AWS_WAF_IAM_ROLE_ARN`	`waf_log_bucket`, `aws_waf_iam_role_arn`

Run each only when you're ready to install that connector. The page for each connector under Install connectors documents the exact env vars and what each secret value should be.

Example: loading github:

export GITHUB_PAT="github_pat_..."
export GITHUB_ORG="my-org"
bash src/connectors/github/scripts/load-secrets.sh
# OK: github secrets loaded into scope mvp-connectors

Verify¶

databricks secrets list-secrets mvp-connectors

Security posture¶

Secret handling in the MVP is functional but not yet production-grade. The bullets below mark what the loaders and runbooks already do correctly, and what a production deployment must layer on top before going live with real credentials.

What the loaders do already¶

No secret values in terraform state. Connector runtimes (gitlab, dependency_track, trufflehog, aws_waf) declare only the handles (*_secret_scope / *_secret_key) for their secrets in variables.tf; the values are loaded directly into the Databricks scope by load-secrets.sh and never traverse terraform. Connector runtimes that do take credential variables (github, sonarqube, semgrep, owasp_zap — all pass AWS keys to provision EKS/RDS/IAM) mark every credential variable sensitive = true, which suppresses plan/apply printing and masks the value in human-readable state output.
Single workspace-level scope. All connector secrets live under one named scope (mvp-connectors), making it possible to revoke or audit access in one operation rather than chasing per-connector scopes.
Idempotent loaders. load-secrets.sh re-runs overwrite existing values — rotation is just export NEW_VAL=...; bash load-secrets.sh, no delete-then-create dance.

What you must add before production¶

1. Avoid leaving secret values in your shell history¶

The runbooks ask you to export TOKEN_VAR="..." before running each loader. That export line is recorded in ~/.bash_history (or zsh equivalent) for the lifetime of the shell session and is readable by anything that can read your home directory. Three patterns that don't leave a trace:

# Read interactively without echo (does not enter history):
read -rs -p "GITHUB_PAT: " GITHUB_PAT && export GITHUB_PAT
bash src/connectors/github/scripts/load-secrets.sh

# Or pipe from a credential manager (1Password CLI shown):
export GITHUB_PAT="$(op read 'op://AppSec/GitHub PAT/credential')"
bash src/connectors/github/scripts/load-secrets.sh

# Or temporarily disable history before the export:
set +o history
export GITHUB_PAT="github_pat_..."
bash src/connectors/github/scripts/load-secrets.sh
set -o history

The same applies to the AWS keys exported before applying connector runtimes that need them (github, sonarqube, semgrep, owasp_zap).

2. Don't pass secrets via `bundle deploy --var`¶

The ServiceNow connector page documents a recovery path where you re-deploy the bundle with --var "servicenow_password=...". That puts the password on the databricks CLI command line — visible in ps aux to any other user on the same machine, and recorded in shell history. Prefer one of:

pass via env var: BUNDLE_VAR_servicenow_password="..." databricks bundle deploy .... The Databricks CLI resolves any DAB variable <name> from a process env var named BUNDLE_VAR_<name>. Env-var passing keeps the value off argv (it lives in /proc/<pid>/environ, only readable by the same UID) and out of ~/.bash_history. This is the pattern documented in Bundle deploy for the ServiceNow credentials.
where the underlying resource type accepts the Databricks secret-reference syntax ({{secrets/scope/key}}) — currently job parameters, cluster spark conf, and init scripts — wire the resource directly to the secret without a DAB variable in between. UC connection options (the path used by servicenow) do not yet accept this syntax in DAB yaml; the env-var pattern above is the supported workaround there.

3. Configure scope ACLs¶

Out of the box, only the user who created the scope (and workspace admins) can read its secrets. As soon as you grant another principal READ or MANAGE on mvp-connectors, that principal sees every connector's credentials. Lock the scope down to the connector job's service principal:

# Find the principal the connector jobs run as:
databricks jobs get <job-id> --output JSON | jq '.run_as'

# Grant READ to that principal only; revoke from the human user who bootstrapped:
databricks secrets put-acl mvp-connectors <service-principal-id> READ
databricks secrets delete-acl mvp-connectors <bootstrap-user-email>

If multiple connectors with separate trust boundaries share the workspace (e.g. one connector accesses a high-blast-radius source like ServiceNow), split per-connector scopes (mvp-github, mvp-servicenow, …) and grant each job's service principal READ on only its own scope.

4. Use Databricks audit logs¶

Workspace audit logs include secrets.getSecret events. Stand up a regular review (or a dashboard query) to spot:

secret reads from principals other than the connector job's service principal,
bursts of reads outside the connector's scheduled window,
reads that don't correlate with a connector run.

Enable system-table delivery and query system.access.audit directly, or pipe the workspace audit log to your SIEM.

5. Rotate on a defined cadence¶

Static credentials silently lose blast-radius accountability the longer they live. Recommended cadences for the credential types in the MVP:

Credential	Rotate every	Rotation procedure
GitHub PAT (`github_token`)	90 days, or immediately on suspected compromise	Generate new fine-grained PAT in GitHub UI → re-run `load-secrets.sh` with the new value → revoke old PAT after one successful job run.
GitLab PAT (`gitlab_token`)	90 days	Same flow against GitLab → re-run loader → revoke old.
ServiceNow password	per the corporate password policy of the user (typically 90 days)	Rotate in ServiceNow → re-run loader → re-deploy bundle if Lakeflow connection caches the value.
SonarQube user token (`sonarqube_token`)	90 days	Generate new token in SonarQube My Account → Security → re-run loader → revoke old.
Dependency-Track API key (`dependency_track_api_key`)	90 days	Generate new key for the connector team in DT → re-run loader → delete old.
ZAP API key (`zap_api_key`)	30 days (especially if the daemon is internet-reachable)	Restart the daemon with `-config api.key=<new>` → re-run loader.
AWS access keys (used by github/sonarqube/semgrep/owasp_zap runtimes and trufflehog log-secrets)	30 days	Generate new key pair in IAM → update terraform `tfvars` → `terraform apply` → re-run loader for trufflehog → deactivate old key after one successful pipeline run.

6. Plan the migration off long-lived static credentials¶

The MVP uses long-lived static credentials uniformly (PATs, passwords, AWS access keys). Production deployments should migrate each to its short-lived counterpart:

Source	MVP credential	Production target
GitHub	Personal Access Token	GitHub App with installation tokens (1-hour TTL)
GitLab	Personal Access Token	GitLab project access token with expiry, or OAuth client credentials
ServiceNow	Username + password (Basic auth)	OAuth 2.0 client credentials flow (the connector code already supports it; only the loader is single-credential-style)
SonarQube	User token	Project analysis token scoped to the connector's read paths
AWS	Access key ID + secret access key	IAM role assumed via STS — either an instance profile on the Databricks workspace AWS service credential, or `AssumeRoleWithWebIdentity` from a workload identity. The trufflehog loader's JSON blob shape (`{access_key_id, secret_access_key}`) becomes a session token after migration.

These are out of MVP scope but should land before the first real-data run.

7. Loader argv exposure (low-priority)¶

load-secrets.sh invokes databricks secrets put-secret SCOPE KEY --string-value "$VAR", so the secret value briefly appears in argv of the databricks process and is visible to anything with ps access during that window. On a single-user developer machine the risk is negligible. On a shared CI runner or a multi-user dev VM, prefer a stdin-fed loader pattern; track the upgrade as a follow-on.

Common errors¶

Symptom	Cause	Fix
`EXTERNAL_LOCATION_ROLE_ARN is required`	Env var not exported.	Export the value from Prerequisites.
`RESOURCE_ALREADY_EXISTS` (visible in stderr but script continues)	Object already created on a prior run.	Expected. Idempotency is handled by the `grep -v` filter. The script proceeds and reports `OK: platform bootstrap complete`.
`PERMISSION_DENIED: Cannot create storage credential`	The deploying principal lacks the `CREATE STORAGE CREDENTIAL` privilege on the metastore.	Have a metastore admin grant the privilege.
External location create returns `INVALID_PARAMETER_VALUE: Storage credential references an IAM role that cannot be assumed by Databricks UC`	Trust policy on `EXTERNAL_LOCATION_ROLE_ARN` doesn't allow the UC managed storage principal.	Update the trust policy per Databricks UC storage credentials docs.
`databricks secrets list-scopes` shows no `mvp-connectors` scope after the script printed `OK`	Workspace selected by the CLI doesn't match the workspace the script targeted.	Confirm `DATABRICKS_HOST` matches the deployed workspace, then re-run.

Next¶

Run Platform bootstrap job to apply the silver table DDL.

Secrets bootstrap¶

Two scripts, two responsibilities¶

Platform bootstrap script¶

Inputs (env vars)¶

What it creates¶

Run¶

Verify¶

Secret loaders for each connector¶

Verify¶

Security posture¶

What the loaders do already¶

What you must add before production¶

1. Avoid leaving secret values in your shell history¶

2. Don't pass secrets via bundle deploy --var¶

3. Configure scope ACLs¶

4. Use Databricks audit logs¶

5. Rotate on a defined cadence¶

6. Plan the migration off long-lived static credentials¶

7. Loader argv exposure (low-priority)¶

Common errors¶

Next¶

2. Don't pass secrets via `bundle deploy --var`¶