What’s a transformer?
Oftentimes we want to modify metadata before it reaches the ingestion sink – for instance, we might want to add custom tags, ownership, properties, or patch some fields. A transformer allows us to do exactly these things.
Moreover, a transformer allows one to have fine-grained control over the metadata that’s ingested without having to modify the ingestion framework's code yourself. Instead, you can write your own module that can transform metadata events however you like. To include a transformer into a recipe, all that's needed is the name of the transformer as well as any configuration that the transformer needs.
Aside from the option of writing your own transformer (see below), we provide some simple transformers for the use cases of adding: tags, glossary terms, properties and ownership information.
DataHub provided transformers for dataset are:
- Simple Add Dataset ownership
- Pattern Add Dataset ownership
- Simple Remove Dataset ownership
- Mark Dataset Status
- Simple Add Dataset globalTags
- Pattern Add Dataset globalTags
- Add Dataset globalTags
- Set Dataset browsePath
- Simple Add Dataset glossaryTerms
- Pattern Add Dataset glossaryTerms
- Pattern Add Dataset Schema Field glossaryTerms
- Pattern Add Dataset Schema Field globalTags
- Simple Add Dataset datasetProperties
- Add Dataset datasetProperties
- Simple Add Dataset domains
- Pattern Add Dataset domains