File Based Lineage
This plugin pulls lineage metadata from a yaml-formatted file. An example of one such file is located in the examples directory here.
CLI based Ingestion
Install the Plugin
pip install 'acryl-datahub[datahub-lineage-file]'
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
# Whether we want to query datahub-gms for upstream data
# sink configs
Note that a
. is used to denote nested fields in the YAML recipe.
View All Configuration Options
|file||✅||string||Path to lineage file to ingest.||None|
|preserve_upstream||boolean||Whether we want to query datahub-gms for upstream data. False means it will hard replace upstream data for a given entity. True means it will query the backend for existing upstreams and include it in the ingestion run||True|
The JSONSchema for this configuration is inlined below.
"description": "Path to lineage file to ingest.",
"title": "Preserve Upstream",
"description": "Whether we want to query datahub-gms for upstream data. False means it will hard replace upstream data for a given entity. True means it will query the backend for existing upstreams and include it in the ingestion run",
Lineage File Format
The lineage source file should be a
.yml file with the following top-level keys:
version: the version of lineage file config the config conforms to. Currently, the only version released
lineage: the top level key of the lineage file containing a list of EntityNodeConfig objects
- entity: EntityConfig object
- upstream: (optional) list of child EntityNodeConfig objects
- name : name of the entity
- type: type of the entity (only
datasetis supported as of now)
- env: the environment of this entity. Should match the values in the table here
- platform: a valid platform like kafka, snowflake, etc..
- platform_instance: optional string specifying the platform instance of this entity
You can also view an example lineage file checked in here
- Class Name:
- Browse on GitHub
If you've got any questions on configuring ingestion for File Based Lineage, feel free to ping us on our Slack