Skip to content

Suppression rules

Suppression rules are operator-authored rows in silver.suppression_rules that mute specific findings at Gold-layer aggregation time. Silver retains the canonical immutable record — every finding the connectors ever ingested stays in silver.findings — and the suppression mechanism filters at read-time inside each Gold notebook before aggregation.

This separation is deliberate. The connector skill chain has no concept of suppression: connectors emit canonical findings and do not consult operator policy. Suppression is an analytics-layer concern, sitting between "what the tool reported" (Silver) and "what the operator wants the dashboard to show" (Gold).

What suppression rules do

Each row in silver.suppression_rules is a single mute. A mute names:

  • A scope (which finding column the rule matches against — e.g. tool_source).
  • A target pattern (the value or wildcard to match — e.g. semgrep, or myorg/legacy-*).
  • An expiry timestamp (rules auto-deactivate after this).
  • A reason (free-text audit note).

When a Gold notebook reads silver.findings, it also reads silver.suppression_rules, drops rules whose expires_at is in the past, and filters out any finding row that matches at least one remaining rule. The aggregation then runs on the filtered set.

The Gold-layer suppression helper that does the matching is at src/analytics/lib/suppression.py. It exposes a pure-Python predicate for unit testing and a Spark-side filter that the notebooks use on the cluster.

Where they are applied

Suppression is applied once at the start of each Gold notebook's run, before that notebook's aggregation step:

Gold artefact Suppression applied? Notes
gold.app_risk_posture_daily Yes Post-join with silver.app_repo_mapping, so application_id rules resolve.
gold.mttr_by_source_severity_weekly Yes Pre-join (no application_id available). Rules scoped to application_id are silently skipped.
gold.coverage_matrix No — by design Coverage tracks tool runs, not finding visibility. A muted finding is still evidence the tool fired.
gold.dedup_link_overlap Yes Applied before the self-join so muted findings do not inflate overlap counts.
gold.cwe_owasp_heatmap Yes Post-join with silver.app_repo_mapping.
gold.app_repo_findings_open (view) No This view backs the OLTP pre-merge gate where policy decisions happen at the App layer, not at view-definition time.

The coverage_matrix exception is the most important to remember. Suppressing a noisy SAST finding will hide it from posture and heatmap rollups but will NOT cause the coverage matrix to flip the repo to "stale" — which is the right behaviour. The tool ran; the operator just chose to ignore one of its findings.

Schema

silver.suppression_rules is an INSERT-only Delta table. The DDL lives in src/platform/sql/silver_tables.sql; the matching Python schema is silver_suppression_rules in src/platform/schemas.py.

Column Type Nullable Notes
rule_id STRING no UUID4 generated by the admin notebook.
scope STRING no One of tool_source, category, application_id, repository_id, file_path, rule_id_native.
target_pattern STRING no Literal value or trailing-wildcard.
expires_at TIMESTAMP no Rule auto-deactivates strictly after this timestamp.
reason STRING yes Free-text audit note; required by the admin notebook even though the column allows NULL.
created_by STRING no Operator email captured from the Databricks notebook context.
created_at TIMESTAMP no UTC creation timestamp.

The scope enum maps directly to the canonical finding columns documented on the Canonical mapping page. Adding a new scope value requires updating the admin notebook's dropdown, the helper at src/analytics/lib/suppression.py, and the canonical-mapping page in lockstep.

Rule semantics

A rule is active when expires_at > now() (strict inequality — exact-second expiries deactivate immediately). The Gold notebooks read the rules table at the start of each run; rules added after a run has started take effect on the next run.

A rule applies to a row when the row contains the scope column and the row's value at that column matches the target pattern. Rules whose scope is not a column on the call-site DataFrame are silently skipped — this is how the same rules table can serve both pre-join contexts (no application_id available) and post-join contexts (application_id present) without throwing.

Two target_pattern forms are supported:

  • Literal exact match. Pattern semgrep matches the value semgrep and nothing else.
  • Trailing wildcard. Pattern myorg/* matches the value myorg itself or any string beginning with myorg/ (so myorg/repo-1, myorg/repo-2, etc.). The wildcard is at the end only; mid-string wildcards are not supported in the MVP.

A row is suppressed when at least one active applicable rule matches. There is no rule precedence or override — rules are an "any of" set.

Operator workflow

The admin notebook at src/analytics/notebooks/admin/manage_suppression.py is the supported write path. Every rule is a single notebook run.

  1. Open the admin notebook in the Databricks workspace (src/analytics/notebooks/admin/manage_suppression.py).
  2. Set the widget values at the top of the notebook:
  3. catalog: the UC catalog name (e.g. appsec_dev).
  4. scope: choose from the dropdown.
  5. target_pattern: the value or wildcard.
  6. expires_at_days: positive integer; the rule will expire that many days from now.
  7. reason: free-form audit note (required).
  8. Run the notebook ("Run all"). The notebook validates the inputs, builds a single-row DataFrame, and appends it to <catalog>.silver.suppression_rules.
  9. Verify the row landed:
    SELECT rule_id, scope, target_pattern, expires_at, reason, created_by, created_at
    FROM appsec_dev.silver.suppression_rules
    ORDER BY created_at DESC
    LIMIT 5;
    
  10. Wait for the next Gold refresh (daily 05:00 UTC) for the suppression to take effect, or trigger an on-demand refresh:
    databricks bundle run analytics --target dev
    

The created_by column is filled from the Databricks notebook context (the operator's workspace email) — there is no widget for it. If the runtime cannot resolve the email (rare, for service-principal-driven runs), the value unknown is written.

Audit

The table is INSERT-only by convention and by design:

  • The admin notebook only ever appends. There is no UPDATE path.
  • Rules do not get edited; they are either active (now < expires_at) or expired.
  • Permanent removal is by leaving an expired rule in the table. Physical deletion is not in scope for the MVP.

Every rule carries created_by and created_at, so the rules table itself is the audit log. To see all rules an operator authored:

SELECT rule_id, scope, target_pattern, expires_at, reason, created_at
FROM appsec_dev.silver.suppression_rules
WHERE created_by = 'ops-engineer@example.com'
ORDER BY created_at DESC;

To see what is currently active:

SELECT rule_id, scope, target_pattern, expires_at, reason, created_by
FROM appsec_dev.silver.suppression_rules
WHERE expires_at > current_timestamp()
ORDER BY expires_at;

Example scenarios

Scenario 1 — Mute false-positive Semgrep findings on a legacy repo for 30 days

The team has been triaging a wave of low-confidence Semgrep findings on myorg/legacy-repo while a refactor is in flight. They want the findings off the dashboard for 30 days without losing the underlying Silver record.

This requires two rules — Semgrep is the tool, the legacy repo is the target — but the simplest operator action is one rule scoped to the combination via repository_id:

Widget Value
scope repository_id
target_pattern myorg/legacy-repo
expires_at_days 30
reason Refactor in flight; revisit 2026-05-25

This mutes all findings on that repo (across every tool) for 30 days. To narrow to just Semgrep, run a second rule with scope = tool_source, target_pattern = semgrep — but only do that if the team really wants to mute Semgrep program-wide for 30 days, since the tool_source rule is not repo-scoped.

Scenario 2 — Mute a sandbox application entirely while it is being rebuilt

APP-007 is a sandbox app slated for a 90-day rewrite. Findings against its repos should disappear from posture rollups during the rewrite — not because they are not real, but because the team is not going to act on them until the new app is in place.

Widget Value
scope application_id
target_pattern APP-007
expires_at_days 90
reason Sandbox rewrite in progress; tracked in JIRA ENG-4321

This rule resolves only after the join with silver.app_repo_mapping, so it appears on gold.app_risk_posture_daily and gold.cwe_owasp_heatmap but is silently skipped in gold.mttr_by_source_severity_weekly (no application_id column at the MTTR aggregation point). That is correct behaviour: MTTR is program-wide and should not be skewed by what is or is not on the posture dashboard.

Scenario 3 — Mute SQL-injection findings in test files for 14 days

A new linter has flooded the dashboard with CWE-89 findings whose file_path is under test/. The team plans to fix the linter configuration in two weeks and wants to suppress the noise in the meantime.

The cleanest approach is one rule scoped to the offending path prefix:

Widget Value
scope file_path
target_pattern test/*
expires_at_days 14
reason Linter mis-configured for test files; PR pending

The trailing-wildcard form matches test, test/foo.py, test/integration/bar.py, and so on. This rule applies on every Gold notebook that includes file_path in its DataFrame columns — posture, MTTR, dedup overlap, heatmap. As before, coverage is unaffected by design.

If the team wanted to narrow further to "only CWE-89 findings in test files", they would need to update the suppression model to support multi-column rules. That is out of scope for the MVP — the workaround is to scope by rule_id_native if the linter rule that fires has a stable identifier.