Skill scope: what the skill chain generates, and what it does not¶

The four category-aware skills at .claude/skills/ (analyze-source, provision-source, generate-connector, validate-implementation) generate the integration layer of this platform. They do not generate the layers below or above. This page draws the boundary explicitly and explains why.

The thesis claim is correspondingly narrow:

Every implementation artifact in the integration layer — that is, every per-source connector that adapts a SOC tool to the canonical lakehouse contract — is generated by four category-aware skills, given the source's public API documentation and a per-source operational data layer.

The integration layer is the part that scales with the catalogue of SOC tools an organisation runs. It is also the part where category structure (SAST, DAST, SCA, secrets, scm, cmdb, waf) makes per-source variation tractable. The skills exist because that variation pattern is mechanisable.

The platform layer below and the analytics layer above are deliberately out of scope, and the sections below explain why for each.

What the skill chain generates¶

Per source, across all 9 connectors in the MVP, the skills produce:

The connector page at mkdocs/docs/connectors/<category>/<source>.md (six top-level sections plus an Implementation log table that records the four skill runs for the source).
The source-side runtime: src/connectors/<source>/runtime/{main,variables,outputs,versions}.tf, runtime/README.md, runtime/install.sh. Operator-authored sidecars under runtime/files/* (demo target forks, scan helper scripts) are referenced from main.tf but are not generated.
The Databricks-side connector module: the eight-file core (config.yml, ingest.py, transform.py, mapping.yml, severity.yml, status.yml, resources/job.yml, tests/) plus the production-shape that puts it on a job (scripts/load-secrets.sh, scripts/install.sh, top-level install.sh, *_entry.py notebook wrappers where the category requires them, sql/<envelope>.sql where the category requires it, resources/{schemas,volumes,connection,pipeline}.yml subset per category, page §4 Setup / §Run-the-job / §Verify / §Troubleshooting runbook).
The validation report: pytest run against the framework contract, per-REQ outcomes (PASS / FAIL / N/A) on the connector page §Validation, and Implementation log row 4.

Per-source operator data lives at src/connectors/<source>/operational.yml in two sub-blocks (source_runtime: for provision-source, databricks_runtime: for generate-connector). Schemas are declared per category in .claude/skills/<skill>/references/<category>.md. When a required field is missing, the skill gathers it conversationally via AskUserQuestion, writes the answer to operational.yml, and proceeds. The data layer is the only thing the operator authors directly per source; everything else is generated.

What the skill chain does not generate, and why¶

The platform layer (`src/platform/`, `src/platform/resources/`, `src/platform/sql/`, `src/platform/scripts/`)¶

Out of scope by design.

The platform layer carries the framework primitives the connector skills run against: the BatchDescriptor / ConnectorState contract (src/platform/contract.py), the bronze schema envelope (src/platform/bronze_schema.py), the canonical Silver schemas (src/platform/schemas.py, src/platform/silver.py), the high-water-mark store (src/platform/hwm.py), the severity/status applicators (src/platform/silver.py's normalize_severity / normalize_status), the CWE helpers, and the UC schema bootstrap SQL. Plus the Databricks Asset Bundle root (databricks.yml) and the platform-bootstrap job that creates the catalog and silver schemas before any connector runs.

This layer is a singleton. There is exactly one platform per deployment. A platform-generation skill would be invoked once, ever, against an empty repository. The reuse leverage that makes per-source skill specialisation worthwhile (nine sources, more later) does not apply.

It is also the contract the connector skills depend on by name. The connector SKILL.md files reference src.platform.contract.BatchDescriptor, src.platform.silver.normalize_severity, and so on as load-bearing imports. For those imports to resolve at all, the named symbols must exist before any connector skill runs. A platform-generation skill would have to fix the contract surface first — which is the same as hand-writing the platform with a stable contract.

A skill-generated platform also has a chicken-and-egg problem with skill validation: the validate-implementation skill's pytest run depends on the platform code being importable. If the platform is itself skill-generated, the validation gate cannot run on the platform's own emit; the skill that generates the platform has no equivalent of validate-implementation.

The platform layer is therefore correctly hand-written, with its contract documented canonically at:

Canonical mapping — Silver entity/finding schemas, severity/status canonical model, dedup-key tuples.
REQ catalog — the framework requirements every connector binds tests against.
Connector job template — the canonical two-task Lakeflow shape resources/job.yml is generated against.
Single silver.findings rationale — why one Silver findings table rather than one per source.

These reference pages serve the same role for the platform layer that the per-category references/<category>.md files serve for the connector layer: a canonical specification the implementation must satisfy. The difference is who writes the implementation. For connectors, a skill does, parameterised by category. For the platform, the platform's authors do, once.

The analytics layer (`src/analytics/`, `mkdocs/docs/analytics/`)¶

Out of scope by design.

The analytics layer consumes the Silver tables the connectors populate and produces the thesis-evidence artefacts: REQ-coverage matrices, dedup-link analysis, severity-distribution rollups across SAST + secrets + DAST, mapping-fidelity reports, the per-connector validation aggregator. These are highly specific to the thesis's research questions. Their value is in their specificity.

Templating analytics queries via a skill would either:

Dilute the evidence value by smoothing source-specific findings into a generic shape; or
Be a thin wrapper over SELECT … FROM silver.findings that adds nothing the skill chain doesn't already document elsewhere.

Analytics is correctly bespoke: each gold-layer aggregation is a thesis claim in code form, written deliberately rather than templated. The analytics layer does not need a skill.

A focused exception was considered and explicitly deferred: a compose-evidence skill that emits the evidence notebooks/jobs from the connector pages plus the catalog REQ matrix. That skill would have leverage similar to validate-implementation and would mechanise the evidence-generation step. It is a sensible follow-up paper or v2 contribution; it is not in MVP scope because the evidence we present is small enough to author directly and benefits from being authored deliberately.

Why "integration layer" is the right scope for skill mechanisation¶

Three properties make the connector layer well-suited for skills, and the absence of any of them in the platform and analytics layers is why they are not skill-generated:

Parametric variation. Connectors vary along two clear axes: source (which tool) and category (what shape of integration). Each axis is finite and enumerated in the framework. The variation is mechanisable because it is bounded.
Public-input availability. The connector skills' inputs are the source's public API documentation and a per-source operational data layer. Both are explicit and external to the framework. A platform-skill or analytics-skill would have no equivalent external input — its inputs would be the framework's own internal decisions about contract and evidence shape.
Reuse leverage. Each connector skill runs once per source and is invoked at least nine times in the MVP, growing as the catalogue does. A platform skill runs once. An analytics skill runs once per evidence question (which is a small, finite, deliberately-authored set). The leverage that pays back the cost of skill design only applies at the connector layer.

The skill chain is therefore not "everything in the framework is generated" but rather "the part of the framework that scales with the SOC tool catalogue is generated, in a way that documents its own generation as a record of correctness". The platform underneath and the analytics on top are stable foundations either side of that scaling layer, and they are correctly authored once with deliberate care rather than mechanised.