To install this plugin, run pip install 'acryl-datahub[nifi]'.


This plugin extracts the following:

  • Nifi flow as DataFlow entity
  • Ingress, egress processors, remote input and output ports as DataJob entity
  • Input and output ports receiving remote connections as Dataset entity
  • Lineage information between external datasets and ingress/egress processors by analyzing provenance events

Current limitations:

  • Limited ingress/egress processors are supported
    • S3: ListS3, FetchS3Object, PutS3Object
    • SFTP: ListSFTP, FetchSFTP, GetSFTP, PutSFTP

Quickstart recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

type: "nifi"
# Coordinates
site_url: "https://localhost:8443/nifi/"

# Credentials
username: admin
password: password

# sink configs

Config details

Note that a . is used to denote nested fields in the YAML recipe.

site_urlURI to connect
site_name"default"Site name to identify this site with, useful when using input and output ports receiving remote connections
auth"NO_AUTH"Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT
usernameNifi username, must be set for auth = "SINGLE_USER"
passwordNifi password, must be set for auth = "SINGLE_USER"
client_cert_filePath to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT"
client_key_filePath to PEM file containing the client’s secret key
client_key_passwordThe password to decrypt the client_key_file
ca_filePath to PEM file containing certs for the root CA(s) for the NiFi
provenance_daystime window to analyze provenance events for external datasets
site_url_to_site_nameLookup to find site_name for site_url, required if using remote process groups in nifi flow
process_group_pattern.allowList of regex patterns for process groups to include in ingestion.
process_group_pattern.denyList of regex patterns for process groups to exclude from ingestion.
process_group_pattern.ignoreCaseTrueWhether to ignore case sensitivity during pattern matching.
env"PROD"Environment to use in namespace when constructing URNs.


