What’s a transformer?
Oftentimes we want to modify metadata before it reaches the ingestion sink – for instance, we might want to add custom tags, ownership, properties, or patch some fields. A transformer allows us to do exactly these things.
Moreover, a transformer allows one to have fine-grained control over the metadata that’s ingested without having to modify the ingestion framework's code yourself. Instead, you can write your own module that can transform metadata events however you like. To include a transformer into a recipe, all that's needed is the name of the transformer as well as any configuration that the transformer needs.
Providing urns for metadata that does not already exist will result in unexpected behavior. Ensure any tags, terms, domains, etc. urns that you want to apply in your transformer already exist in your DataHub instance.
For example, adding a domain urn in your transformer to apply to datasets will not create the domain entity if it doesn't exist. Therefore, you can't add documentation to it and it won't show up in Advanced Search. This goes for any metadata you are applying in transformers.
Aside from the option of writing your own transformer (see below), we provide some simple transformers for the use cases of adding: tags, glossary terms, properties and ownership information.
DataHub provided transformers for dataset are:
- Simple Add Dataset ownership
- Pattern Add Dataset ownership
- Simple Remove Dataset ownership
- Mark Dataset Status
- Simple Add Dataset globalTags
- Pattern Add Dataset globalTags
- Add Dataset globalTags
- Set Dataset browsePath
- Simple Add Dataset glossaryTerms
- Pattern Add Dataset glossaryTerms
- Pattern Add Dataset Schema Field glossaryTerms
- Pattern Add Dataset Schema Field globalTags
- Simple Add Dataset datasetProperties
- Add Dataset datasetProperties
- Simple Add Dataset domains
- Pattern Add Dataset domains