Skip to main content

dbt

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup#

Works with acryl-datahub out of the box.

Capabilities#

This plugin pulls metadata from dbt's artifact files:

  • dbt manifest file
    • This file contains model, source and lineage data.
  • dbt catalog file
    • This file contains schema data.
    • dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
  • dbt sources file
    • This file contains metadata for sources with freshness checks.
    • We transfer dbt's freshness checks to DataHub's last-modified fields.
    • Note that this file is optional โ€“ if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
  • target_platform:
  • load_schemas:
    • Load schemas from dbt catalog file, not necessary when the underlying data platform already has this data.
  • use_identifiers:
    • Use model identifier instead of model name if defined (if not, default to model name).
  • node_type_pattern:
    • Use this filter to exclude and include node types using allow or deny method

Quickstart recipe#

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:  type: "dbt"  config:    # Coordinates    manifest_path: "./path/dbt/manifest_file.json"    catalog_path: "./path/dbt/catalog_file.json"    sources_path: "./path/dbt/sources_file.json"
    # Options    target_platform: "my_target_platform_id"    load_schemas: True # note: if this is disabled
sink:  # sink configs

Config details#

Note that a . is used to denote nested fields in the YAML recipe.

FieldRequiredDefaultDescription
manifest_pathโœ…Path to dbt manifest JSON. See https://docs.getdbt.com/reference/artifacts/manifest-json
catalog_pathโœ…Path to dbt catalog JSON. See https://docs.getdbt.com/reference/artifacts/catalog-json
sources_pathPath to dbt sources JSON. See https://docs.getdbt.com/reference/artifacts/sources-json. If not specified, last-modified fields will not be populated.
env"PROD"Environment to use in namespace when constructing URNs.
target_platformโœ…The platform that dbt is loading onto.
load_schemasโœ…Whether to load database schemas. If set to False, table schema details (e.g. columns) will not be ingested.
use_identifiersFalseWhether to use model identifiers instead of names, if defined (if not, default to names)
node_type_pattern.allowList of regex patterns for dbt nodes to include in ingestion.
node_type_pattern.denyList of regex patterns for dbt nodes to exclude from ingestion.
node_type_pattern.ignoreCaseTrueWhether to ignore case sensitivity during pattern matching.

Compatibility#

Coming soon!

Questions#

If you've got any questions on configuring this source, feel free to ping us on our Slack!