Skip to main content

Kafka Connect

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup#

To install this plugin, run pip install 'acryl-datahub[kafka-connect]'.

Capabilities#

This plugin extracts the following:

  • Kafka Connect connector as individual DataFlowSnapshotClass entity
  • Creating individual DataJobSnapshotClass entity using {connector_name}:{source_dataset} naming
  • Lineage information between source database to Kafka topic

Current limitations:

  • Currently works for JDBC and Debezium source connectors only.

Quickstart recipe#

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:  type: "kafka-connect"  config:    # Coordinates    connect_uri: "http://localhost:8083"    cluster_name: "connect-cluster"
    # Credentials    username: admin    password: password
sink:  # sink configs

Config details#

Note that a . is used to denote nested fields in the YAML recipe.

FieldRequiredDefaultDescription
connect_uri"http://localhost:8083/"URI to connect to.
usernameKafka Connect username.
passwordKafka Connect password.
cluster_name"connect-cluster"Cluster to ingest from.
construct_lineage_workunitsTrueWhether to create the input and output Dataset entities
connector_patterns.denyList of regex patterns for connectors to include in ingestion.
connector_patterns.allowList of regex patterns for connectors to exclude from ingestion.
connector_pattern.ignoreCaseTrueWhether to ignore case sensitivity during pattern matching.
env"PROD"Environment to use in namespace when constructing URNs.

Compatibility#

Coming soon!

Questions#

If you've got any questions on configuring this source, feel free to ping us on our Slack!