Setup platform¶

Phase 1 of the install flow. The platform layer is the Databricks-resident substrate that every connector lands data into and every analytics computation reads from. This section documents the four user steps needed to stand it up.

Platform name disambiguation

The word platform is used in two senses in this repository. 1. Setup phase for users (this section): the workspace bootstrap that produces the catalog, schemas, jobs, and secret scope container deployed by DAB before any connector is installed. 2. src/platform/ Python framework: the shared library (HTTP client, pagination, severity and status normalization, dedup) that every connector imports. See Project layout for the module layout. Both senses appear throughout the docs. Context disambiguates which is meant.

Phase 1: four steps¶

The redesigned platform is stood up in four sequential steps. Each page is self-sustained: an user can finish the step from that page alone.

1. Prerequisites

Inputs supplied by the user: AWS backbone (VPC, EKS, S3, IAM), Databricks workspace plus UC metastore, local CLI tooling, env var conventions.
2. Bundle deploy

databricks bundle deploy --target dev. Creates the catalog, schemas, jobs, pipelines, volumes, and the ServiceNow connection.
3. Secrets bootstrap

Run src/platform/scripts/bootstrap.sh to create the secret scope, storage credential, and external location. Secret loaders for each connector populate values when each connector is wired.
4. Platform bootstrap job

databricks bundle run platform-bootstrap. Applies the silver table DDL via the SQL warehouse for the platform.

After all four steps land, Phase 1 is complete and the platform is ready to install connectors. Move on to Install connectors. Start with the SCM category because SCM connectors populate silver.repositories, which findings from every other connector reference.

Architecture context¶

The platform implements a medallion layout across three layers:

Bronze: raw landed data, one schema per source (bronze_<source>), schema on read.
Silver: standardized entity and finding tables, severity and status normalized. Cross-source tables live in the silver schema. Projections for each source live in silver_<source> schemas.
Gold: aggregations, evidence views, and dashboards consumed by Analytics.

flowchart LR
    src[Source API / artifact] --> ingest[ingest.py<br/>per connector]
    ingest --> bronze[(Bronze table<br/>raw + ingestion metadata)]
    bronze --> transform[transform.py<br/>+ src/platform/silver.py]
    transform --> silver[(Silver table<br/>canonical entities / findings)]
    silver --> dedup[src/platform/dedup.py]
    dedup --> silverlinks[(silver.dedup_links)]
    silver --> gold[SQL in src/analytics/sql/]
    gold --> gold_tables[(Gold<br/>app-level aggregates)]

All tables live in Unity Catalog under a three-tier namespace (<catalog>.bronze_<source>.*, <catalog>.silver.*, <catalog>.silver_<source>.*, <catalog>.gold.*). The <catalog> token varies by environment: appsec_dev, appsec_staging, appsec_prod.

Reference¶

Project layout: top-level directory structure and module structure for each component.
Recommended mapping: Silver schemas consumed by all connectors.
REQ catalog: normative requirement identifiers with traceability matrix.
Connector skills: the four skills (analyze-source, provision-source, generate-connector, validate-implementation) that drive the connector lifecycle.
Source capability matrix: protocol, pagination, HWM, and severity for each source.
Source characteristics: protocol decision context for each source.
Connector job template: DAB fragment structure every connector follows.
Silver table ownership: which connector populates which Silver table.
Single silver.findings rationale: design decision for the cross-source findings table.