Setup platform¶
Phase 1 of the install flow. The platform layer is the Databricks-resident substrate that every connector lands data into and every analytics computation reads from. This section documents the four user steps needed to stand it up.
Platform name disambiguation
The word platform is used in two senses in this repository.
1. Setup phase for users (this section): the workspace
bootstrap that produces the catalog, schemas, jobs, and
secret scope container deployed by DAB before any connector is installed.
2. src/platform/ Python framework: the shared library
(HTTP client, pagination, severity and status normalization, dedup) that
every connector imports. See
Project layout for the module layout.
Both senses appear throughout the docs. Context disambiguates which is
meant.
Phase 1: four steps¶
The redesigned platform is stood up in four sequential steps. Each page is self-sustained: an user can finish the step from that page alone.
-
Inputs supplied by the user: AWS backbone (VPC, EKS, S3, IAM), Databricks workspace plus UC metastore, local CLI tooling, env var conventions.
-
databricks bundle deploy --target dev. Creates the catalog, schemas, jobs, pipelines, volumes, and the ServiceNow connection. -
Run
src/platform/scripts/bootstrap.shto create the secret scope, storage credential, and external location. Secret loaders for each connector populate values when each connector is wired. -
databricks bundle run platform-bootstrap. Applies the silver table DDL via the SQL warehouse for the platform.
After all four steps land, Phase 1 is complete and the platform is ready to
install connectors. Move on to Install connectors.
Start with the SCM category because SCM
connectors populate silver.repositories, which findings from every other connector
reference.
Architecture context¶
The platform implements a medallion layout across three layers:
- Bronze: raw landed data, one schema per source (
bronze_<source>), schema on read. - Silver: standardized entity and finding tables, severity and status
normalized. Cross-source tables live in the
silverschema. Projections for each source live insilver_<source>schemas. - Gold: aggregations, evidence views, and dashboards consumed by Analytics.
flowchart LR
src[Source API / artifact] --> ingest[ingest.py<br/>per connector]
ingest --> bronze[(Bronze table<br/>raw + ingestion metadata)]
bronze --> transform[transform.py<br/>+ src/platform/silver.py]
transform --> silver[(Silver table<br/>canonical entities / findings)]
silver --> dedup[src/platform/dedup.py]
dedup --> silverlinks[(silver.dedup_links)]
silver --> gold[SQL in src/analytics/sql/]
gold --> gold_tables[(Gold<br/>app-level aggregates)]
All tables live in Unity Catalog under a three-tier namespace
(<catalog>.bronze_<source>.*, <catalog>.silver.*,
<catalog>.silver_<source>.*, <catalog>.gold.*). The <catalog> token
varies by environment: appsec_dev, appsec_staging, appsec_prod.
Reference¶
- Project layout: top-level directory structure and module structure for each component.
- Recommended mapping: Silver schemas consumed by all connectors.
- REQ catalog: normative requirement identifiers with traceability matrix.
- Connector skills: the four skills
(
analyze-source,provision-source,generate-connector,validate-implementation) that drive the connector lifecycle. - Source capability matrix: protocol, pagination, HWM, and severity for each source.
- Source characteristics: protocol decision context for each source.
- Connector job template: DAB fragment structure every connector follows.
- Silver table ownership: which connector populates which Silver table.
- Single
silver.findingsrationale: design decision for the cross-source findings table.