Skip to main content
Version: Next

NiFi

Certified

Important Capabilities

CapabilityStatusNotes
Table-Level LineageSupported. See docs for limitations

Concept Mapping

Source ConceptDataHub ConceptNotes
"Nifi"Data Platform
Nifi flowData Flow
Nifi Ingress / Egress ProcessorData Job
Nifi Remote PortData Job
Nifi Port with remote connectionsDataset
Nifi Process GroupContainerSubtype Process Group

Caveats

  • This plugin extracts the lineage information between external datasets and ingress/egress processors by analyzing provenance events. Please check your Nifi configuration to confirm max rentention period of provenance events and make sure that ingestion runs frequent enough to read provenance events before they are disappear.

  • Limited ingress/egress processors are supported

    • S3: ListS3, FetchS3Object, PutS3Object
    • SFTP: ListSFTP, FetchSFTP, GetSFTP, PutSFTP

CLI based Ingestion

Install the Plugin

pip install 'acryl-datahub[nifi]'

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: "nifi"
config:
# Coordinates
site_url: "https://localhost:8443/nifi/"

# Credentials
auth: SINGLE_USER
username: admin
password: password

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
site_url 
string
URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/
auth
Enum
Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS
Default: NO_AUTH
ca_file
One of boolean, string
Path to PEM file containing certs for the root CA(s) for the NiFi.Set to False to disable SSL verification.
client_cert_file
string
Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT"
client_key_file
string
Path to PEM file containing the client’s secret key
client_key_password
string
The password to decrypt the client_key_file
emit_process_group_as_container
boolean
Whether to emit Nifi process groups as container entities.
Default: False
incremental_lineage
boolean
When enabled, emits incremental/patch lineage for Nifi processors. When disabled, re-states lineage on each run.
Default: True
password
string
Nifi password, must be set for auth = "SINGLE_USER"
provenance_days
integer
time window to analyze provenance events for external datasets
Default: 7
site_name
string
Site name to identify this site with, useful when using input and output ports receiving remote connections
Default: default
site_url_to_site_name
map(str,string)
username
string
Nifi username, must be set for auth = "SINGLE_USER"
env
string
The environment that all assets produced by this connector belong to
Default: PROD
process_group_pattern
AllowDenyPattern
regex patterns for filtering process groups
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
process_group_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
process_group_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
process_group_pattern.allow.string
string
process_group_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
process_group_pattern.deny.string
string

Authentication

This connector supports following authentication mechanisms

Single User Authentication (auth: SINGLE_USER)

Connector will pass this username and password as used on Nifi Login Page over /access/token REST endpoint. This mode also works when Kerberos login identity provider is set up for Nifi.

Client Certificates Authentication (auth: CLIENT_CERT)

Connector will use client_cert_file(required) and client_key_file(optional), client_key_password(optional) for mutual TLS authentication.

Kerberos Authentication via SPNEGO (auth: Kerberos)

If nifi has been configured to use Kerberos SPNEGO, connector will pass user’s Kerberos ticket to nifi over /access/kerberos REST endpoint. It is assumed that user's Kerberos ticket is already present on the machine on which ingestion runs. This is usually done by installing krb5-user and then running kinit for user.

sudo apt install krb5-user
kinit user@REALM

Basic Authentication (auth: BASIC_AUTH)

Connector will use HTTPBasicAuth with username and password.

No Authentication (auth: NO_AUTH)

This is useful for testing purposes.

Access Policies

This connector requires following access policies to be set in Nifi for ingestion user.

Global Access Policies

PolicyPrivilegeResourceAction
view the UIAllows users to view the UI/flowR
query provenanceAllows users to submit a Provenance Search and request Event Lineage/provenanceR

Component level Access Policies (required to be set on root process group)

PolicyPrivilegeResourceAction
view the componentAllows users to view component configuration details/<component-type>/<component-UUID>R
view the dataAllows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events/data/<component-type>/<component-UUID>R
view provenanceAllows users to view provenance events generated by this component/provenance-data/<component-type>/<component-UUID>R

Code Coordinates

  • Class Name: datahub.ingestion.source.nifi.NifiSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.