Bundle deploy¶
Deploy the Databricks Asset Bundle (DAB) defined at the repository root. This is step 2 of the four-step Phase 1 platform flow: Prerequisites, then Bundle deploy, then Secrets bootstrap, then Platform bootstrap job.
The bundle is the single source of truth for every Databricks resource the platform owns. A clean checkout against an empty workspace ends with a deployable bundle: catalog, schemas, jobs, pipelines, volumes, and connections all created, none yet running.
Inputs this step consumes¶
From Prerequisites:
DATABRICKS_HOST: workspace URL (env var; resolved by${env.DATABRICKS_HOST}indatabricks.yml).DATABRICKS_TOKEN: workspace PAT (env var read by the Databricks CLI).WAREHOUSE_ID: SQL warehouse ID for the platform bootstrap job (passed via--var).ARTIFACT_BUCKET: S3 bucket name for scanner artifacts (passed via--var).- ServiceNow connection params (passed via
--var, only if the servicenow connector is in scope):servicenow_host,servicenow_username,servicenow_password.
The databricks.yml file in the bundle declares these as DAB variables. Any unset
variable forces the user to supply it on the command line. The deploy
fails fast rather than silently using a default.
Targets¶
Three targets are pre-defined in databricks.yml:
| Target | Mode | Catalog | When to use |
|---|---|---|---|
dev (default) |
development | appsec_dev |
First-time setup, day-to-day iteration. |
staging |
production | appsec_staging |
Optional pre-prod target, deployed under /Shared/appsec-staging. |
prod |
production | appsec_prod |
Production deployment under /Shared/appsec. |
The redesigned bundle does not auto-create the workspace, the catalog, or the SQL warehouse. Those live in Prerequisites. It creates only the resources listed below within an existing workspace.
Deploy¶
From a clean checkout, with the env vars from Prerequisites exported:
# Validate the bundle resolves and the YAML is well-formed.
databricks bundle validate \
--target dev \
--var "warehouse_id=${WAREHOUSE_ID}" \
--var "artifact_bucket=${ARTIFACT_BUCKET}"
# Deploy. First run creates the catalog, schemas, jobs, pipelines, volumes,
# connection. Subsequent runs reconcile any drift.
databricks bundle deploy \
--target dev \
--var "warehouse_id=${WAREHOUSE_ID}" \
--var "artifact_bucket=${ARTIFACT_BUCKET}"
If you intend to deploy the Lakeflow pipeline and UC connection for the servicenow connector, the host is non-sensitive configuration (pass as a flag), but the credentials should come from process env rather than argv so they don't land in ~/.bash_history or ps aux output. The Databricks CLI resolves any DAB variable <name> from a process env var named BUNDLE_VAR_<name>:
# Host: non-sensitive configuration; --var is fine.
# Username + password: pass via env vars so the values stay off argv.
BUNDLE_VAR_servicenow_username="${SERVICENOW_USERNAME}" \
BUNDLE_VAR_servicenow_password="${SERVICENOW_PASSWORD}" \
databricks bundle deploy \
--target dev \
--var "warehouse_id=${WAREHOUSE_ID}" \
--var "artifact_bucket=${ARTIFACT_BUCKET}" \
--var "servicenow_host=${SERVICENOW_HOST}"
Subsequent connector deploys can reuse the same command. The bundle is declarative and idempotent.
Resources the bundle creates¶
The DAB include glob (src/platform/resources/*.yml,
src/connectors/*/resources/*.yml, src/analytics/resources/*.yml) picks up
fragments from each component automatically. After databricks bundle deploy
the workspace contains:
Platform layer (src/platform/resources/)¶
| Resource | Type | Purpose |
|---|---|---|
appsec |
catalog | Unity Catalog (appsec_dev, appsec_staging, or appsec_prod) for Bronze, Silver, and Gold. |
silver |
schema | Cross-source standardized Silver: findings, hwm, repositories, app_repo. |
platform-bootstrap |
job | One-task SQL job that runs src/platform/sql/silver_tables.sql against the SQL warehouse. User runs it once after secrets are loaded. See Platform bootstrap job. |
Connector layers (src/connectors/<source>/resources/)¶
| Connector | Resources | Notes |
|---|---|---|
| github | bronze_github schema, silver_github schema, github-connector job. |
Two-task job (ingest then transform), scheduled every 15 minutes. |
| servicenow | bronze_servicenow schema, silver_servicenow schema, servicenow connection, servicenow_ingest Lakeflow pipeline. |
Lakeflow Connect ingestion of cmdb_ci_business_app and cmdb_rel_ci, daily cron. The connection consumes servicenow_host, servicenow_username, and servicenow_password DAB variables. |
| sonarqube | bronze_sonarqube schema, sonarqube-connector job. |
Two-task job (ingest then transform). The connector module is a structural skeleton (ingest() and transform() raise NotImplementedError). |
| semgrep | bronze_semgrep schema, semgrep_artifacts external volume (S3-backed). |
Reads scan artifacts from s3://${artifact_bucket}/semgrep/ via the volume. No job. Connector ingest entry-points are scaffolded but not wired. |
| owasp_zap | bronze_owasp_zap schema, zap_artifacts external volume (S3-backed). |
Same structure as semgrep. Reads artifacts from s3://${artifact_bucket}/zap/. |
Analytics layer (src/analytics/resources/)¶
| Resource | Type | Purpose |
|---|---|---|
gold |
schema | Cross-source analytics outputs (owned by analytics). |
analytics |
job | Placeholder job pointing at src/analytics/sql/gold_findings_summary_placeholder.sql. Full analytics implementation is future work. |
The total resource count after a clean bundle deploy against dev is
roughly: 1 catalog, 9 schemas, 2 volumes, 1 connection, 1 pipeline, 4 jobs.
Verify¶
# List the bundle deployments in the workspace.
databricks bundle summary --target dev
# Confirm catalog and schemas exist.
databricks catalogs get appsec_dev
databricks schemas list appsec_dev
# Confirm jobs are visible.
databricks jobs list --output JSON | jq '.jobs[] | select(.settings.name | startswith("github") or startswith("sonarqube") or startswith("platform-bootstrap")) | .settings.name'
# Confirm the servicenow Lakeflow pipeline is registered (only if --var
# servicenow_* values were supplied).
databricks pipelines list-pipelines | grep servicenow_ingest
The jobs and pipeline are present but not yet runnable: silver tables have not been created (the platform bootstrap job hasn't run) and secrets for each source are not in the secret scope. Those land in the next two steps.
Common errors¶
| Symptom | Cause | Fix |
|---|---|---|
Error: variable "warehouse_id" has not been assigned a value |
Missing --var "warehouse_id=..." on the deploy command. |
Re-run with the variable. The bundle deliberately has no default to force user awareness. |
INVALID_PARAMETER_VALUE: Catalog 'appsec_dev' already exists with a different owner |
Catalog created by a previous attempt under a different principal. | Drop the catalog (databricks catalogs delete appsec_dev --force) and redeploy, or change the catalog variable for this target. |
PERMISSION_DENIED: Cannot create catalog |
The user behind the PAT lacks the CREATE CATALOG privilege on the metastore. |
Have a metastore admin grant CREATE CATALOG to the deploying principal, or switch to an admin PAT for first-time setup. |
INVALID_PARAMETER_VALUE: Connection 'servicenow' could not be created: authentication failed |
servicenow_* variables wrong. |
Re-validate against the ServiceNow tenant: curl -u $USER:$PASS https://$HOST/api/now/table/cmdb_ci_business_app?sysparm_limit=1. Re-deploy with corrected values. |
Volume external storage location 's3://.../semgrep/' is not found |
UC external location not yet created. Runs before src/platform/scripts/bootstrap.sh. |
Run Secrets bootstrap, then re-deploy. The external location is created by the bootstrap script. Re-deploys are idempotent. |
Next¶
Run Secrets bootstrap to create the cross-cutting secret scope, storage credential, and external location, then load secrets for each connector.