Skip to content

Setup platform

Phase 1 of the install flow. The platform layer is the Databricks-resident substrate that every connector lands data into and every analytics computation reads from. This section documents the four user steps needed to stand it up.

Platform name disambiguation

The word platform is used in two senses in this repository. 1. Setup phase for users (this section): the workspace bootstrap that produces the catalog, schemas, jobs, and secret scope container deployed by DAB before any connector is installed. 2. src/platform/ Python framework: the shared library (HTTP client, pagination, severity and status normalization, dedup) that every connector imports. See Project layout for the module layout. Both senses appear throughout the docs. Context disambiguates which is meant.

Phase 1: four steps

The redesigned platform is stood up in four sequential steps. Each page is self-sustained: an user can finish the step from that page alone.

  • 1. Prerequisites


    Inputs supplied by the user: AWS backbone (VPC, EKS, S3, IAM), Databricks workspace plus UC metastore, local CLI tooling, env var conventions.

  • 2. Bundle deploy


    databricks bundle deploy --target dev. Creates the catalog, schemas, jobs, pipelines, volumes, and the ServiceNow connection.

  • 3. Secrets bootstrap


    Run src/platform/scripts/bootstrap.sh to create the secret scope, storage credential, and external location. Secret loaders for each connector populate values when each connector is wired.

  • 4. Platform bootstrap job


    databricks bundle run platform-bootstrap. Applies the silver table DDL via the SQL warehouse for the platform.

After all four steps land, Phase 1 is complete and the platform is ready to install connectors. Move on to Install connectors. Start with the SCM category because SCM connectors populate silver.repositories, which findings from every other connector reference.

Architecture context

The platform implements a medallion layout across three layers:

  • Bronze: raw landed data, one schema per source (bronze_<source>), schema on read.
  • Silver: standardized entity and finding tables, severity and status normalized. Cross-source tables live in the silver schema. Projections for each source live in silver_<source> schemas.
  • Gold: aggregations, evidence views, and dashboards consumed by Analytics.
flowchart LR
    src[Source API / artifact] --> ingest[ingest.py<br/>per connector]
    ingest --> bronze[(Bronze table<br/>raw + ingestion metadata)]
    bronze --> transform[transform.py<br/>+ src/platform/silver.py]
    transform --> silver[(Silver table<br/>canonical entities / findings)]
    silver --> dedup[src/platform/dedup.py]
    dedup --> silverlinks[(silver.dedup_links)]
    silver --> gold[SQL in src/analytics/sql/]
    gold --> gold_tables[(Gold<br/>app-level aggregates)]

All tables live in Unity Catalog under a three-tier namespace (<catalog>.bronze_<source>.*, <catalog>.silver.*, <catalog>.silver_<source>.*, <catalog>.gold.*). The <catalog> token varies by environment: appsec_dev, appsec_staging, appsec_prod.

Reference