Bundle deploy¶

Deploy the Databricks Asset Bundle (DAB) defined at the repository root. This is step 2 of the four-step Phase 1 platform flow: Prerequisites, then Bundle deploy, then Secrets bootstrap, then Platform bootstrap job.

The bundle is the single source of truth for every Databricks resource the platform owns. A clean checkout against an empty workspace ends with a deployable bundle: catalog, schemas, jobs, pipelines, volumes, and connections all created, none yet running.

Inputs this step consumes¶

From Prerequisites:

DATABRICKS_HOST: workspace URL (env var; resolved by ${env.DATABRICKS_HOST} in databricks.yml).
DATABRICKS_TOKEN: workspace PAT (env var read by the Databricks CLI).
WAREHOUSE_ID: SQL warehouse ID for the platform bootstrap job (passed via --var).
ARTIFACT_BUCKET: S3 bucket name for scanner artifacts (passed via --var).
ServiceNow connection params (passed via --var, only if the servicenow connector is in scope): servicenow_host, servicenow_username, servicenow_password.

The databricks.yml file in the bundle declares these as DAB variables. Any unset variable forces the user to supply it on the command line. The deploy fails fast rather than silently using a default.

Targets¶

Three targets are pre-defined in databricks.yml:

Target	Mode	Catalog	When to use
`dev` (default)	development	`appsec_dev`	First-time setup, day-to-day iteration.
`staging`	production	`appsec_staging`	Optional pre-prod target, deployed under `/Shared/appsec-staging`.
`prod`	production	`appsec_prod`	Production deployment under `/Shared/appsec`.

The redesigned bundle does not auto-create the workspace, the catalog, or the SQL warehouse. Those live in Prerequisites. It creates only the resources listed below within an existing workspace.

Deploy¶

From a clean checkout, with the env vars from Prerequisites exported:

# Validate the bundle resolves and the YAML is well-formed.
databricks bundle validate \
  --target dev \
  --var "warehouse_id=${WAREHOUSE_ID}" \
  --var "artifact_bucket=${ARTIFACT_BUCKET}"

# Deploy. First run creates the catalog, schemas, jobs, pipelines, volumes,
# connection. Subsequent runs reconcile any drift.
databricks bundle deploy \
  --target dev \
  --var "warehouse_id=${WAREHOUSE_ID}" \
  --var "artifact_bucket=${ARTIFACT_BUCKET}"

If you intend to deploy the Lakeflow pipeline and UC connection for the servicenow connector, the host is non-sensitive configuration (pass as a flag), but the credentials should come from process env rather than argv so they don't land in ~/.bash_history or ps aux output. The Databricks CLI resolves any DAB variable <name> from a process env var named BUNDLE_VAR_<name>:

# Host: non-sensitive configuration; --var is fine.
# Username + password: pass via env vars so the values stay off argv.
BUNDLE_VAR_servicenow_username="${SERVICENOW_USERNAME}" \
BUNDLE_VAR_servicenow_password="${SERVICENOW_PASSWORD}" \
databricks bundle deploy \
  --target dev \
  --var "warehouse_id=${WAREHOUSE_ID}" \
  --var "artifact_bucket=${ARTIFACT_BUCKET}" \
  --var "servicenow_host=${SERVICENOW_HOST}"

Subsequent connector deploys can reuse the same command. The bundle is declarative and idempotent.

Resources the bundle creates¶

The DAB include glob (src/platform/resources/*.yml, src/connectors/*/resources/*.yml, src/analytics/resources/*.yml) picks up fragments from each component automatically. After databricks bundle deploy the workspace contains:

Platform layer (`src/platform/resources/`)¶

Resource	Type	Purpose
`appsec`	catalog	Unity Catalog (`appsec_dev`, `appsec_staging`, or `appsec_prod`) for Bronze, Silver, and Gold.
`silver`	schema	Cross-source standardized Silver: `findings`, `hwm`, `repositories`, `app_repo`.
`platform-bootstrap`	job	One-task SQL job that runs `src/platform/sql/silver_tables.sql` against the SQL warehouse. User runs it once after secrets are loaded. See Platform bootstrap job.

Connector layers (`src/connectors/<source>/resources/`)¶

Connector	Resources	Notes
github	`bronze_github` schema, `silver_github` schema, `github-connector` job.	Two-task job (ingest then transform), scheduled every 15 minutes.
servicenow	`bronze_servicenow` schema, `silver_servicenow` schema, `servicenow` connection, `servicenow_ingest` Lakeflow pipeline.	Lakeflow Connect ingestion of `cmdb_ci_business_app` and `cmdb_rel_ci`, daily cron. The connection consumes `servicenow_host`, `servicenow_username`, and `servicenow_password` DAB variables.
sonarqube	`bronze_sonarqube` schema, `sonarqube-connector` job.	Two-task job (ingest then transform). The connector module is a structural skeleton (`ingest()` and `transform()` raise `NotImplementedError`).
semgrep	`bronze_semgrep` schema, `semgrep_artifacts` external volume (S3-backed).	Reads scan artifacts from `s3://${artifact_bucket}/semgrep/` via the volume. No job. Connector ingest entry-points are scaffolded but not wired.
owasp_zap	`bronze_owasp_zap` schema, `zap_artifacts` external volume (S3-backed).	Same structure as semgrep. Reads artifacts from `s3://${artifact_bucket}/zap/`.

Analytics layer (`src/analytics/resources/`)¶

Resource	Type	Purpose
`gold`	schema	Cross-source analytics outputs (owned by analytics).
`analytics`	job	Placeholder job pointing at `src/analytics/sql/gold_findings_summary_placeholder.sql`. Full analytics implementation is future work.

The total resource count after a clean bundle deploy against dev is roughly: 1 catalog, 9 schemas, 2 volumes, 1 connection, 1 pipeline, 4 jobs.

Verify¶

# List the bundle deployments in the workspace.
databricks bundle summary --target dev

# Confirm catalog and schemas exist.
databricks catalogs get appsec_dev
databricks schemas list appsec_dev

# Confirm jobs are visible.
databricks jobs list --output JSON | jq '.jobs[] | select(.settings.name | startswith("github") or startswith("sonarqube") or startswith("platform-bootstrap")) | .settings.name'

# Confirm the servicenow Lakeflow pipeline is registered (only if --var
# servicenow_* values were supplied).
databricks pipelines list-pipelines | grep servicenow_ingest

The jobs and pipeline are present but not yet runnable: silver tables have not been created (the platform bootstrap job hasn't run) and secrets for each source are not in the secret scope. Those land in the next two steps.

Common errors¶

Symptom	Cause	Fix
`Error: variable "warehouse_id" has not been assigned a value`	Missing `--var "warehouse_id=..."` on the deploy command.	Re-run with the variable. The bundle deliberately has no default to force user awareness.
`INVALID_PARAMETER_VALUE: Catalog 'appsec_dev' already exists with a different owner`	Catalog created by a previous attempt under a different principal.	Drop the catalog (`databricks catalogs delete appsec_dev --force`) and redeploy, or change the `catalog` variable for this target.
`PERMISSION_DENIED: Cannot create catalog`	The user behind the PAT lacks the `CREATE CATALOG` privilege on the metastore.	Have a metastore admin grant `CREATE CATALOG` to the deploying principal, or switch to an admin PAT for first-time setup.
`INVALID_PARAMETER_VALUE: Connection 'servicenow' could not be created: authentication failed`	`servicenow_*` variables wrong.	Re-validate against the ServiceNow tenant: `curl -u $USER:$PASS https://$HOST/api/now/table/cmdb_ci_business_app?sysparm_limit=1`. Re-deploy with corrected values.
`Volume external storage location 's3://.../semgrep/' is not found`	UC external location not yet created. Runs before `src/platform/scripts/bootstrap.sh`.	Run Secrets bootstrap, then re-deploy. The external location is created by the bootstrap script. Re-deploys are idempotent.

Next¶

Run Secrets bootstrap to create the cross-cutting secret scope, storage credential, and external location, then load secrets for each connector.