Connector job template¶
Every batch style connector instantiates the same Lakeflow Job structure. It is a two task DAG where an ingest task produces bronze records and a transform task consumes them to produce silver. The transform task declares a hard dependency on the ingest task, so a failed ingest short circuits the job without leaving silver partially refreshed. Lakeflow Connect connectors (e.g. ServiceNow) substitute a pipeline resource declared in src/connectors/<source>/resources/pipeline.yml for this job structure.
Bundle fragment¶
src/connectors/<source>/resources/job.yml:
resources:
jobs:
<source>-connector:
name: <source>-connector
parameters:
- name: source_name
default: "<source>"
- name: target_catalog
default: "${var.catalog}"
- name: hwm_reset
default: "false"
schedule:
quartz_cron_expression: "0 */15 * * * ?"
timezone_id: "UTC"
tasks:
- task_key: ingest
notebook_task:
notebook_path: ../ingest_entry.py
job_cluster_key: shared
max_retries: 3
min_retry_interval_millis: 2000
retry_on_timeout: true
- task_key: transform
depends_on:
- task_key: ingest
notebook_task:
notebook_path: ../transform_entry.py
job_cluster_key: shared
max_retries: 3
min_retry_interval_millis: 2000
retry_on_timeout: true
job_clusters:
- job_cluster_key: shared
new_cluster:
spark_version: "14.3.x-scala2.12"
node_type_id: Standard_DS3_v2
num_workers: 1
Parameters¶
source_name: connector source name, used throughout resource naming (e.g.github,sonarqube).target_catalog: Unity Catalog catalog name for the target environment. Each deployment target supplies its own catalog viavar.catalogat the bundle root (e.g.appsec_dev,appsec_prod). Passed astarget_catalogto both the ingest and transform tasks.hwm_reset: boolean flag (default"false"). Set to"true"to force high water mark re-initialisation on the next run. Intended for manual backfills only.quartz_cron_expression: the quartz cron expression driving scheduled runs. Source characteristics govern the cadence. High change sources (SCM platforms, active scanners) run every 15 minutes (github) or every 3 hours (sonarqube). Stable sources (CMDB application inventory) run daily.
Retry configuration¶
Retry configuration is identical across connectors: three attempts (max_retries: 3), with min_retry_interval_millis set per the expected transient failure profile of the source (typically 2000). This isolates transient source faults from pipeline faults. If retries exhaust, the task fails and downstream tasks in the same job do not execute.
Credentials¶
Each new connector substitutes the source name and credential reference. Credentials come from the mvp-connectors Databricks secret scope, never from the bundle fragment itself. Secret loading for each connector happens via src/connectors/<source>/scripts/load-secrets.sh. See Secrets bootstrap.