Standardized Mapping Requirements¶

The standardized Silver layer schemas commit the framework to a single vendor agnostic entity and finding model. This page states, per schema, the requirement the implementation SHALL satisfy when mapping a source record into Silver.

Silver Entity Mapping Requirements¶

Entity tables (applications, repositories, teams, commits, pull requests, pipeline runs, dependencies, branch policies) are populated from the entity emitting sources in the selection. The implementation SHALL union over the native fields these sources expose according to the table below. Every standardized field maps to the source field shown in the corresponding column, with the derivation on the right. Fields marked as framework generated are assigned by the connector or transformation layer, not read from the source.

Silver Entity Pattern field derivation across entity emitting sources¶

Standard field	ServiceNow CMDB	GitHub	GitLab	Derivation
`id`	(generated)	(generated)	(generated)	surrogate key, framework assigned
`natural_key`	`sys_id`	`node_id`	`id` (`path_with_namespace` for repos)	the stable primary key in the source
`source_system`	`"servicenow"`	`"github"`	`"gitlab"`	literal per connector
`valid_from`	`sys_created_on`	`created_at`	`created_at`	creation timestamp
`valid_to`	(framework SCD2)	(framework SCD2)	(framework SCD2)	set on supersedure
Domain columns	`name`, `business_criticality`, `operational_status`, `owned_by`, ...	`full_name`, `default_branch`, `visibility`, `language`, ...	`path_with_namespace`, `default_branch`, `visibility`, `archived`, ...	attributes specific to the entity type

Silver Finding Mapping Requirements¶

All findings are populated into the single Silver Finding table silver.findings, discriminated by a category column. The implementation SHALL union over the native fields these sources expose according to the tables below. Standard fields marked "N/A" for a given source are stored as NULL in records from that source. This is the intended union over sources behavior, and the mapping.yml for each source makes each assignment explicit, including the category value of the record. The three tables group sources by finding structure. The first covers code level sources (SAST and secrets). The second covers package level and platform integrated sources (SCA, GitHub and GitLab platform native findings). The third covers runtime edge-event sources (WAF), which project each event as one finding row.

Silver Finding derivation: code level sources (SAST and secrets)¶

Rows whose fields are N/A for all three sources are omitted. They appear in the next table.

Standard field	SonarQube	Semgrep	TruffleHog
`finding_id`	(generated)	(generated)	(generated)
`source_finding_id`	`key`	`id` (Cloud) / `check_id`+`path`+`line` (CLI)	`DetectorType`+`commit`+`file`+`line`
`source_tool`	`"sonarqube"`	`"semgrep"`	`"trufflehog"`
`repository_id`	`component` (project)	`repository.name` / git path	`SourceMetadata.Data.Git.repository`
`severity`	`severity` (BLOCKER … INFO)	`severity` (CLI or Cloud)	N/A (convention=high)
`status`	`status` + `resolution`	`triage_state`	N/A
`rule_id`	`rule`	`rule_name` (Cloud) / `check_id` (CLI)	`DetectorName`
`file_path`	`component` (extract)	`location.file_path` / `path`	`SourceMetadata.Data.Git.file`
`line_number`	`line`	`location.line` / `start.line`	`SourceMetadata.Data.Git.line`
`secret_type`	N/A	N/A	`DetectorName`
`validity_status`	N/A	N/A	`Verified` + `VerificationError`
`detected_at`	`creationDate`	`first_seen` (Cloud)	`SourceMetadata.Data.Git.timestamp`
`resolved_at`	(on status transition)	(on `triage_state` transition)	N/A (full-reload)

Silver Finding derivation: package level and platform sources¶

Dependency-Track produces package vulnerability findings. GitHub and GitLab expose platform native findings spanning Dependabot (SCA), code scanning (SAST), and secret scanning.

Standard field	Dependency-Track	GitHub / GitLab (platform)
`finding_id`	(generated)	(generated)
`source_finding_id`	`component.uuid` + `vulnerability.vulnId`	`number` (GH) / `id` (GL)
`source_tool`	`"dependency-track"`	`"github"` / `"gitlab"`
`repository_id`	`project` (via PURL or project prop)	repo reference
`severity`	`vulnerability.severity` (CRITICAL … UNASSIGNED)	`rule.security_severity_level` (GH) / `severity` (GL)
`status`	(derived from analyzer + resolution)	`state` (GH) / `state` (GL)
`rule_id`	`vulnerability.vulnId`	`rule.id`
`cve_id`	`vulnerability.vulnId` (CVE-*)	`security_advisory.cve_id` (Dependabot)
`file_path`	N/A	`most_recent_instance.location.path`
`line_number`	N/A	`most_recent_instance.location.start_line`
`package_name`	`component.name`	`dependency.package.name` (Dependabot)
`package_version`	`component.version`	`dependency.package.version` (Dependabot)
`ecosystem`	`component.purl` (extract)	`dependency.package.ecosystem`
`secret_type`	N/A	`secret_type` (GH Secret Scanning)
`validity_status`	N/A	`validity` (GH Secret Scanning)
`detected_at`	`attribution.attributedOn`	`created_at`
`resolved_at`	(on project audit)	`fixed_at` / `resolved_at`

Silver Finding derivation: runtime edge-event sources (WAF)¶

WAF connectors (AWS WAF) project each edge event as one finding row on silver.findings. The per-event projection follows the trufflehog convention for sources without a native lifecycle: severity is derived from the action via an action-keyed lookup, status is the literal open and never transitions, and finding_id is a deterministic SHA-256 hash so re-deliveries collapse at the Bronze-to-Silver MERGE. WAF telemetry beyond the canonical record — source_ip, country, http_method, response_code, sampling_weight, rule_type, and the action value itself — is intentionally dropped. Operators query the upstream WAF logs (S3 / CloudWatch) directly when they need that detail.

Standard field	AWS WAF
`finding_id`	(derived) SHA-256 of `(webaclId, httpRequest.requestId, timestamp)`
`tool_source`	`"aws_waf"`
`category`	`"waf"`
`severity_canonical`	derived from `action` via `severity.yml` (block→high, count→medium, challenge→low, captcha→low, allow→low)
`status_canonical`	literal `"open"` (no native lifecycle)
`rule_id_native`	`terminatingRuleId`
`url`	`httpRequest.uri`
`repository_id`	N/A (WebACLs are not repo-scoped; Gold-side aggregations bucket WAF rows under the `__UNMAPPED__` application sentinel until an operator extends `silver.app_repo_mapping` with a `webacl_arn → application_id` mapping — out of scope for the MVP)
`cwe_id` / `cve_id`	N/A
`file_path` / `start_line`	N/A
`first_seen_at` / `last_seen_at`	`timestamp` (epoch ms → UTC datetime at transform)

REQ-DEDUP is N/A for WAF: WAF rows do not share dedup tuples with SAST / SCA / secret / DAST findings, so no dedup_links rows are emitted. Replay deduplication (recovering from re-delivered events) is achieved by the deterministic finding_id collapsing onto the same row at MERGE, not by a dedup_links entry.

Severity and Status Normalization Requirements¶

The implementation SHALL harmonize the native severity scale of each source to the standardized four level model (critical, high, medium, low) through a lookup table for each source. The table is co-located with the connector at src/connectors/{source}/severity.yml. Each lookup SHALL cover every documented source value. Undocumented source values fall through to a configurable default (medium unless the config.yml for the connector overrides it) and SHALL trigger a data quality warning. A null or missing source severity is mapped to medium and similarly flagged.

The implementation SHALL translate the native lifecycle state of each source to the standardized five state model (open, confirmed, resolved, false_positive, wontfix) through an analogous lookup at src/connectors/{source}/status.yml.

Both severity and status lookup tables SHALL be maintained as configuration files rather than code so that vocabulary updates do not require a pipeline redeploy.

All source timestamps SHALL be converted to UTC during the Bronze to Silver transformation. Formats specific to each source (ISO 8601 with or without offsets, Unix epoch in seconds or milliseconds, and tool specific strings) SHALL be parsed during schema mapping and stored as UTC datetime columns.