dbt
Module dbt
Important Capabilities
Capability | Status | Notes |
---|---|---|
Dataset Usage | ❌ | |
Detect Deleted Entities | ✅ | Enabled via stateful ingestion |
Table-Level Lineage | ✅ | Enabled by default |
This plugin pulls metadata from dbt's artifact files and generates:
- dbt Tables: for nodes in the dbt manifest file that are models materialized as tables
- dbt Views: for nodes in the dbt manifest file that are models materialized as views
- dbt Ephemeral: for nodes in the dbt manifest file that are ephemeral models
- dbt Sources: for nodes that are sources on top of the underlying platform tables
- dbt Seed: for seed entities
- dbt Tests as Assertions: for dbt test entities (starting with version 0.8.38.1)
Note:
- It also generates lineage between the
dbt
nodes (e.g. ephemeral nodes that depend on other dbt sources) as well as lineage between thedbt
nodes and the underlying (target) platform nodes (e.g. BigQuery Table -> dbt Source, dbt View -> BigQuery View). - The previous version of this source (
acryl_datahub<=0.8.16.2
) did not generatedbt
entities and lineage betweendbt
entities and platform entities. For backwards compatibility with the previous version of this source, there is a config flagdisable_dbt_node_creation
that falls back to the old behavior. - We also support automated actions (like add a tag, term or owner) based on properties defined in dbt meta.
The artifacts used by this source are:
- dbt manifest file
- This file contains model, source, tests and lineage data.
- dbt catalog file
- This file contains schema data.
- dbt does not record schema data for Ephemeral models, as such datahub will show Ephemeral models in the lineage, however there will be no associated schema for Ephemeral models
- dbt sources file
- This file contains metadata for sources with freshness checks.
- We transfer dbt's freshness checks to DataHub's last-modified fields.
- Note that this file is optional – if not specified, we'll use time of ingestion instead as a proxy for time last-modified.
- dbt run_results file
- This file contains metadata from the result of a dbt run, e.g. dbt test
- When provided, we transfer dbt test run results into assertion run events to see a timeline of test runs on the dataset
Install the Plugin
pip install 'acryl-datahub[dbt]'
Quickstart Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide
source:
type: "dbt"
config:
# Coordinates
# To use this as-is, set the environment variable DBT_PROJECT_ROOT to the root folder of your dbt project
manifest_path: "${DBT_PROJECT_ROOT}/target/manifest_file.json"
catalog_path: "${DBT_PROJECT_ROOT}/target/catalog_file.json"
sources_path: "${DBT_PROJECT_ROOT}/target/sources_file.json" # optional for freshness
test_results_path: "${DBT_PROJECT_ROOT}/target/run_results.json" # optional for recording dbt test results after running dbt test
# Options
target_platform: "my_target_platform_id" # e.g. bigquery/postgres/etc.
load_schemas: False # note: enable this only if you are not ingesting metadata from your warehouse
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
View All Configuration Options
Field | Required | Type | Description | Default |
---|---|---|---|---|
env | string | Environment to use in namespace when constructing URNs. | PROD | |
platform | string | The platform that this source connects to | None | |
platform_instance | string | The instance of the platform that all assets produced by this recipe belong to | None | |
manifest_path | ✅ | string | Path to dbt manifest JSON. See https://docs.getdbt.com/reference/artifacts/manifest-json Note this can be a local file or a URI. | None |
catalog_path | ✅ | string | Path to dbt catalog JSON. See https://docs.getdbt.com/reference/artifacts/catalog-json Note this can be a local file or a URI. | None |
sources_path | string | Path to dbt sources JSON. See https://docs.getdbt.com/reference/artifacts/sources-json. If not specified, last-modified fields will not be populated. Note this can be a local file or a URI. | None | |
test_results_path | string | Path to output of dbt test run as run_results file in JSON format. See https://docs.getdbt.com/reference/artifacts/run-results-json. If not specified, test execution results will not be populated in DataHub. | None | |
target_platform | ✅ | string | The platform that dbt is loading onto. (e.g. bigquery / redshift / postgres etc.) | None |
target_platform_instance | string | The platform instance for the platform that dbt is operating on. Use this if you have multiple instances of the same platform (e.g. redshift) and need to distinguish between them. | None | |
load_schemas | boolean | This flag is only consulted when disable_dbt_node_creation is set to True. Load schemas for target_platform entities from dbt catalog file, not necessary when you are already ingesting this metadata from the data platform directly. If set to False, table schema details (e.g. columns) will not be ingested. | True | |
use_identifiers | boolean | Use model identifier instead of model name if defined (if not, default to model name). | False | |
tag_prefix | string | Prefix added to tags during ingestion. | dbt: | |
meta_mapping | Dict | mapping rules that will be executed against dbt meta properties. Refer to the section below on dbt meta automated mappings. | {} | |
query_tag_mapping | Dict | mapping rules that will be executed against dbt query_tag meta properties. Refer to the section below on dbt meta automated mappings. | {} | |
write_semantics | string | Whether the new tags, terms and owners to be added will override the existing ones added only by this source or not. Value for this config can be "PATCH" or "OVERRIDE" | PATCH | |
strip_user_ids_from_email | boolean | Whether or not to strip email id while adding owners using dbt meta actions. | False | |
owner_extraction_pattern | string | Regex string to extract owner from the dbt node using the (?P<name>...) syntax of the match object, where the group name must be owner . Examples: (1)r"(?P<owner>(.*)): (\w+) (\w+)" will extract jdoe as the owner from "jdoe: John Doe" (2) r"@(?P<owner>(.*))" will extract alice as the owner from "@alice" . | None | |
delete_tests_as_datasets | boolean | Prior to version 0.8.38, dbt tests were represented as datasets. If you ingested dbt tests before, set this flag to True (just needed once) to soft-delete tests that were generated as datasets by previous ingestion. | False | |
disable_dbt_node_creation | boolean | Whether to suppress dbt dataset metadata creation. When set to True, this flag applies the dbt metadata to the target_platform entities (e.g. populating schema and column descriptions from dbt into the postgres / bigquery table metadata in DataHub) and generates lineage between the platform entities. | False | |
enable_meta_mapping | boolean | When enabled, applies the mappings that are defined through the meta_mapping directives. | True | |
enable_query_tag_mapping | boolean | When enabled, applies the mappings that are defined through the query_tag_mapping directives. | True | |
stateful_ingestion | DBTStatefulIngestionConfig (see below for fields) | |||
stateful_ingestion.enabled | boolean | The type of the ingestion state provider registered with datahub. | False | |
stateful_ingestion.max_checkpoint_state_size | integer | The maximum size of the checkpoint state in bytes. Default is 16MB | 16777216 | |
stateful_ingestion.state_provider | DynamicTypedStateProviderConfig (see below for fields) | The ingestion state provider configuration. | ||
stateful_ingestion.state_provider.type | ✅ | string | The type of the state provider to use. For DataHub use datahub | None |
stateful_ingestion.state_provider.config | Generic dict | The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19). | None | |
stateful_ingestion.ignore_old_state | boolean | If set to True, ignores the previous checkpoint state. | False | |
stateful_ingestion.ignore_new_state | boolean | If set to True, ignores the current checkpoint state. | False | |
stateful_ingestion.remove_stale_metadata | boolean | True | ||
node_type_pattern | AllowDenyPattern (see below for fields) | regex patterns for dbt nodes to filter in ingestion. | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
node_type_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
node_type_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
node_type_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
node_type_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] | |
node_name_pattern | AllowDenyPattern (see below for fields) | regex patterns for dbt model names to filter in ingestion. | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
node_name_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
node_name_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
node_name_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
node_name_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] | |
aws_connection | AwsConnectionConfig (see below for fields) | When fetching manifest files from s3, configuration for aws connection details | ||
aws_connection.aws_access_key_id | string | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | None | |
aws_connection.aws_secret_access_key | string | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | None | |
aws_connection.aws_session_token | string | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | None | |
aws_connection.aws_role | Generic dict | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html | None | |
aws_connection.aws_profile | string | Named AWS profile to use, if not set the default will be used | None | |
aws_connection.aws_region | ✅ | string | AWS region code. | None |
aws_connection.aws_endpoint_url | string | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html | None | |
aws_connection.aws_proxy | Dict[str,string] | Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html |
The JSONSchema for this configuration is inlined below.
{
"title": "DBTConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "Environment to use in namespace when constructing URNs.",
"default": "PROD",
"type": "string"
},
"platform": {
"title": "Platform",
"description": "The platform that this source connects to",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to",
"type": "string"
},
"stateful_ingestion": {
"$ref": "#/definitions/DBTStatefulIngestionConfig"
},
"manifest_path": {
"title": "Manifest Path",
"description": "Path to dbt manifest JSON. See https://docs.getdbt.com/reference/artifacts/manifest-json Note this can be a local file or a URI.",
"type": "string"
},
"catalog_path": {
"title": "Catalog Path",
"description": "Path to dbt catalog JSON. See https://docs.getdbt.com/reference/artifacts/catalog-json Note this can be a local file or a URI.",
"type": "string"
},
"sources_path": {
"title": "Sources Path",
"description": "Path to dbt sources JSON. See https://docs.getdbt.com/reference/artifacts/sources-json. If not specified, last-modified fields will not be populated. Note this can be a local file or a URI.",
"type": "string"
},
"test_results_path": {
"title": "Test Results Path",
"description": "Path to output of dbt test run as run_results file in JSON format. See https://docs.getdbt.com/reference/artifacts/run-results-json. If not specified, test execution results will not be populated in DataHub.",
"type": "string"
},
"target_platform": {
"title": "Target Platform",
"description": "The platform that dbt is loading onto. (e.g. bigquery / redshift / postgres etc.)",
"type": "string"
},
"target_platform_instance": {
"title": "Target Platform Instance",
"description": "The platform instance for the platform that dbt is operating on. Use this if you have multiple instances of the same platform (e.g. redshift) and need to distinguish between them.",
"type": "string"
},
"load_schemas": {
"title": "Load Schemas",
"description": "This flag is only consulted when disable_dbt_node_creation is set to True. Load schemas for target_platform entities from dbt catalog file, not necessary when you are already ingesting this metadata from the data platform directly. If set to False, table schema details (e.g. columns) will not be ingested.",
"default": true,
"type": "boolean"
},
"use_identifiers": {
"title": "Use Identifiers",
"description": "Use model identifier instead of model name if defined (if not, default to model name).",
"default": false,
"type": "boolean"
},
"node_type_pattern": {
"title": "Node Type Pattern",
"description": "regex patterns for dbt nodes to filter in ingestion.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"tag_prefix": {
"title": "Tag Prefix",
"description": "Prefix added to tags during ingestion.",
"default": "dbt:",
"type": "string"
},
"node_name_pattern": {
"title": "Node Name Pattern",
"description": "regex patterns for dbt model names to filter in ingestion.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"meta_mapping": {
"title": "Meta Mapping",
"description": "mapping rules that will be executed against dbt meta properties. Refer to the section below on dbt meta automated mappings.",
"default": {},
"type": "object"
},
"query_tag_mapping": {
"title": "Query Tag Mapping",
"description": "mapping rules that will be executed against dbt query_tag meta properties. Refer to the section below on dbt meta automated mappings.",
"default": {},
"type": "object"
},
"write_semantics": {
"title": "Write Semantics",
"description": "Whether the new tags, terms and owners to be added will override the existing ones added only by this source or not. Value for this config can be \"PATCH\" or \"OVERRIDE\"",
"default": "PATCH",
"type": "string"
},
"strip_user_ids_from_email": {
"title": "Strip User Ids From Email",
"description": "Whether or not to strip email id while adding owners using dbt meta actions.",
"default": false,
"type": "boolean"
},
"owner_extraction_pattern": {
"title": "Owner Extraction Pattern",
"description": "Regex string to extract owner from the dbt node using the `(?P<name>...) syntax` of the [match object](https://docs.python.org/3/library/re.html#match-objects), where the group name must be `owner`. Examples: (1)`r\"(?P<owner>(.*)): (\\w+) (\\w+)\"` will extract `jdoe` as the owner from `\"jdoe: John Doe\"` (2) `r\"@(?P<owner>(.*))\"` will extract `alice` as the owner from `\"@alice\"`.",
"type": "string"
},
"aws_connection": {
"title": "Aws Connection",
"description": "When fetching manifest files from s3, configuration for aws connection details",
"allOf": [
{
"$ref": "#/definitions/AwsConnectionConfig"
}
]
},
"delete_tests_as_datasets": {
"title": "Delete Tests As Datasets",
"description": "Prior to version 0.8.38, dbt tests were represented as datasets. If you ingested dbt tests before, set this flag to True (just needed once) to soft-delete tests that were generated as datasets by previous ingestion.",
"default": false,
"type": "boolean"
},
"disable_dbt_node_creation": {
"title": "Disable Dbt Node Creation",
"description": "Whether to suppress dbt dataset metadata creation. When set to True, this flag applies the dbt metadata to the target_platform entities (e.g. populating schema and column descriptions from dbt into the postgres / bigquery table metadata in DataHub) and generates lineage between the platform entities.",
"default": false,
"type": "boolean"
},
"enable_meta_mapping": {
"title": "Enable Meta Mapping",
"description": "When enabled, applies the mappings that are defined through the meta_mapping directives.",
"default": true,
"type": "boolean"
},
"enable_query_tag_mapping": {
"title": "Enable Query Tag Mapping",
"description": "When enabled, applies the mappings that are defined through the `query_tag_mapping` directives.",
"default": true,
"type": "boolean"
}
},
"required": [
"manifest_path",
"catalog_path",
"target_platform"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19)."
}
},
"required": [
"type"
],
"additionalProperties": false
},
"DBTStatefulIngestionConfig": {
"title": "DBTStatefulIngestionConfig",
"description": "Specialization of basic StatefulIngestionConfig to adding custom config.\nThis will be used to override the stateful_ingestion config param of StatefulIngestionConfigBase\nin the SQLAlchemyConfig.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "The type of the ingestion state provider registered with datahub.",
"default": false,
"type": "boolean"
},
"max_checkpoint_state_size": {
"title": "Max Checkpoint State Size",
"description": "The maximum size of the checkpoint state in bytes. Default is 16MB",
"default": 16777216,
"exclusiveMinimum": 0,
"type": "integer"
},
"state_provider": {
"title": "State Provider",
"description": "The ingestion state provider configuration.",
"allOf": [
{
"$ref": "#/definitions/DynamicTypedStateProviderConfig"
}
]
},
"ignore_old_state": {
"title": "Ignore Old State",
"description": "If set to True, ignores the previous checkpoint state.",
"default": false,
"type": "boolean"
},
"ignore_new_state": {
"title": "Ignore New State",
"description": "If set to True, ignores the current checkpoint state.",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns for process groups to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns for process groups to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
},
"alphabet": {
"title": "Alphabet",
"description": "Allowed alphabets pattern",
"default": "[A-Za-z0-9 _.-]",
"type": "string"
}
},
"additionalProperties": false
},
"AwsConnectionConfig": {
"title": "AwsConnectionConfig",
"description": "Common AWS credentials config.\n\nCurrently used by:\n - Glue source\n - SageMaker source\n - dbt source",
"type": "object",
"properties": {
"aws_access_key_id": {
"title": "Aws Access Key Id",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html",
"type": "string"
},
"aws_secret_access_key": {
"title": "Aws Secret Access Key",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html",
"type": "string"
},
"aws_session_token": {
"title": "Aws Session Token",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html",
"type": "string"
},
"aws_role": {
"title": "Aws Role",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html",
"anyOf": [
{
"type": "string"
},
{
"type": "array",
"items": {
"type": "string"
}
}
]
},
"aws_profile": {
"title": "Aws Profile",
"description": "Named AWS profile to use, if not set the default will be used",
"type": "string"
},
"aws_region": {
"title": "Aws Region",
"description": "AWS region code.",
"type": "string"
},
"aws_endpoint_url": {
"title": "Aws Endpoint Url",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html",
"type": "string"
},
"aws_proxy": {
"title": "Aws Proxy",
"description": "Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html",
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"required": [
"aws_region"
],
"additionalProperties": false
}
}
}
dbt meta automated mappings
dbt allows authors to define meta properties for datasets. Checkout this link to know more - dbt meta. Our dbt source allows users to define
actions such as add a tag, term or owner. For example if a dbt model has a meta config "has_pii": True
, we can define an action
that evaluates if the property is set to true and add, lets say, a pii
tag.
To leverage this feature we require users to define mappings as part of the recipe. The following section describes how you can build these mappings. Listed below is a meta_mapping section that among other things, looks for keys like business_owner
and adds owners that are listed there.
- YAML
- JSON
meta_mapping:
business_owner:
match: ".*"
operation: "add_owner"
config:
owner_type: user
owner_category: BUSINESS_OWNER
has_pii:
match: True
operation: "add_tag"
config:
tag: "has_pii_test"
int_property:
match: 1
operation: "add_tag"
config:
tag: "int_meta_property"
double_property:
match: 2.5
operation: "add_term"
config:
term: "double_meta_property"
data_governance.team_owner:
match: "Finance"
operation: "add_term"
config:
term: "Finance_test"
"meta_mapping": {
"business_owner": {
"match": ".*",
"operation": "add_owner",
"config": {"owner_type": "user", "owner_category": "BUSINESS_OWNER"},
},
"has_pii": {
"match": True,
"operation": "add_tag",
"config": {"tag": "has_pii_test"},
},
"int_property": {
"match": 1,
"operation": "add_tag",
"config": {"tag": "int_meta_property"},
},
"double_property": {
"match": 2.5,
"operation": "add_term",
"config": {"term": "double_meta_property"},
},
"data_governance.team_owner": {
"match": "Finance",
"operation": "add_term",
"config": {"term": "Finance_test"},
},
}
We support the following operations:
- add_tag - Requires
tag
property in config. - add_term - Requires
term
property in config. - add_owner - Requires
owner_type
property in config which can be either user or group. Optionally accepts theowner_category
config property which you can set to one of['TECHNICAL_OWNER', 'BUSINESS_OWNER', 'DATA_STEWARD', 'DATAOWNER'
(defaults toDATAOWNER
).
Note:
- Currently, dbt meta mapping is only supported for meta elements defined at the model level (not supported for columns).
- For string meta properties we support regex matching.
With regex matching, you can also use the matched value to customize how you populate the tag, term or owner fields. Here are a few advanced examples:
Data Tier - Bronze, Silver, Gold
If your meta section looks like this:
meta:
data_tier: Bronze # chosen from [Bronze,Gold,Silver]
and you wanted to attach a glossary term like urn:li:glossaryTerm:Bronze
for all the models that have this value in the meta section attached to them, the following meta_mapping section would achieve that outcome:
meta_mapping:
data_tier:
match: "Bronze|Silver|Gold"
operation: "add_term"
config:
term: "{{ $match }}"
to match any data_tier of Bronze, Silver or Gold and maps it to a glossary term with the same name.
Case Numbers - create tags
If your meta section looks like this:
meta:
case: PLT-4678 # internal Case Number
and you want to generate tags that look like case_4678
from this, you can use the following meta_mapping section:
meta_mapping:
case:
match: "PLT-(.*)"
operation: "add_tag"
config:
tag: "case_{{ $match }}"
Stripping out leading @ sign
You can also match specific groups within the value to extract subsets of the matched value. e.g. if you have a meta section that looks like this:
meta:
owner: "@finance-team"
business_owner: "@janet"
and you want to mark the finance-team as a group that owns the dataset (skipping the leading @ sign), while marking janet as an individual user (again, skipping the leading @ sign) that owns the dataset, you can use the following meta-mapping section.
meta_mapping:
owner:
match: "^@(.*)"
operation: "add_owner"
config:
owner_type: group
business_owner:
match: "^@(?P<owner>(.*))"
operation: "add_owner"
config:
owner_type: user
owner_category: BUSINESS_OWNER
In the examples above, we show two ways of writing the matching regexes. In the first one, ^@(.*)
the first matching group (a.k.a. match.group(1)) is automatically inferred. In the second example, ^@(?P<owner>(.*))
, we use a named matching group (called owner, since we are matching an owner) to capture the string we want to provide to the ownership urn.
dbt query_tag automated mappings
This works similarly as the dbt meta mapping but for the query tags
We support the below actions -
- add_tag - Requires
tag
property in config.
The below example set as global tag the query tag tag
key's value.
"query_tag_mapping":
{
"tag":
"match": ".*"
"operation": "add_tag"
"config":
"tag": "{{ $match }}"
}
Integrating with dbt test
To integrate with dbt tests, the dbt
source needs access to the run_results.json
file generated after a dbt test
execution. Typically, this is written to the target
directory. A common pattern you can follow is:
- Run
dbt docs generate
and uploadmanifest.json
andcatalog.json
to a location accessible to thedbt
source (e.g. s3 or local file system) - Run
dbt test
and uploadrun_results.json
to a location accessible to thedbt
source (e.g. s3 or local file system) - Run
datahub ingest -c dbt_recipe.dhub.yaml
with the following config parameters specified- test_results_path: pointing to the run_results.json file that you just created
The connector will produce the following things:
- Assertion definitions that are attached to the dataset (or datasets)
- Results from running the tests attached to the timeline of the dataset
View of dbt tests for a dataset
Viewing the SQL for a dbt test
Viewing timeline for a failed dbt test
Code Coordinates
- Class Name:
datahub.ingestion.source.dbt.DBTSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for dbt, feel free to ping us on our Slack