Skip to content

Evidence

Three scenarios that exercise the end-to-end pipeline. Completing all three is your "you're done" signal.

Low-latency consumers use the OLTP path

The queries on this page run against Silver directly via a SQL warehouse — the right shape for ad-hoc evidence runs. Real-time consumers (Slack bot, IDE plugin, CI/CD pre-merge gate) should use the Databricks app instead, which reads the Online Tables replicas at sub-50 ms PK lookup latency and applies the configured pre-merge policy. The OLTP-served metrics are derived from the same Silver record these evidence queries scan, so the numbers reconcile.

Evidence 1: Cross-tool deduplication

Claim: two independent SAST tools (SonarQube and Semgrep) pointed at the same repository produce overlapping findings that the pipeline deduplicates.

Setup

You already have SAST target repos (BenchmarkJava, BenchmarkPython) referenced by the github runtime under src/connectors/github/runtime/. These are forks of the OWASP Benchmark projects, which carry a curated catalogue of planted defects (CWE-89, CWE-78, CWE-79, CWE-22 among others). Scan each repo with sonar-scanner-cli per the SonarQube connector page, Run the job, then wait for the next semgrep CronJob (or trigger one manually per the Semgrep connector page; the runbook section is restored by the production-shape follow-up). Run the ingest jobs for both connectors via databricks bundle run sonarqube-connector and databricks bundle run semgrep-connector.

Query

WITH raw AS (
  SELECT repository_id, file_path, start_line, cwe_id, tool_source, trigger_context
    FROM appsec_dev.silver.findings
    WHERE tool_source IN ('sonarqube', 'semgrep')
      AND category = 'sast'
)
SELECT
  count(*) FILTER (WHERE tool_source='sonarqube') AS sonarqube_count,
  count(*) FILTER (WHERE tool_source='semgrep')   AS semgrep_count,
  count(DISTINCT (repository_id, file_path, start_line, cwe_id)) AS deduped_count,
  (count(*) - count(DISTINCT (repository_id, file_path, start_line, cwe_id))) AS overlap_count
FROM raw;

Expected

  • sonarqube_count ≥ 4 (four planted defects).
  • semgrep_count ≥ 4 (same four).
  • overlap_count ≥ 4. Sonar and Semgrep both find the same defects at the same locations.
  • deduped_count ≤ sonarqube_count + semgrep_count - overlap_count.

Evidence 2: Business application rollup

Claim: the linkage from ServiceNow to GitHub joins findings to business applications, answering "which business apps carry critical unresolved SAST findings?".

Query

SELECT
  app.name AS business_app,
  count(DISTINCT f.finding_id) AS critical_findings
FROM appsec_dev.silver_servicenow.applications app
JOIN appsec_dev.silver.app_repo_mapping ar USING (application_id)
JOIN appsec_dev.silver.repositories r ON r.repository_id = ar.repository_id
JOIN appsec_dev.silver.findings f ON f.repository_id = r.repository_id
WHERE f.severity_canonical IN ('critical', 'high')
  AND f.status_canonical = 'open'
  AND f.category = 'sast'
GROUP BY app.name
ORDER BY critical_findings DESC;

Expected

Two rows: "AppSec Demo Frontend" and "AppSec Demo Backend", each with a non-zero finding count that matches the planted defects in each linked repo.

Evidence 3: Variety in finding structure

Claim: DAST findings located by URL (file_path IS NULL) and SAST findings located by code coexist in the same table.

Query

SELECT category,
       count(*)                                AS total,
       count(*) FILTER (WHERE file_path IS NOT NULL) AS code_located,
       count(*) FILTER (WHERE url IS NOT NULL)       AS url_located
FROM appsec_dev.silver.findings
GROUP BY category;

Expected

  • category='sast' row: code_located = total, url_located = 0.
  • category='dast' row: code_located = 0, url_located = total.

Done

If all three expectations hold, you have reproduced the MVP.