Skip to main content

DataHub

DataHub Rest#

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup#

To install this plugin, run pip install 'acryl-datahub[datahub-rest]'.

Capabilities#

Pushes metadata to DataHub using the GMS REST API. The advantage of the REST-based interface is that any errors can immediately be reported.

Quickstart recipe#

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:  # source configssink:  type: "datahub-rest"  config:    server: "http://localhost:8080"

Config details#

Note that a . is used to denote nested fields in the YAML recipe.

FieldRequiredDefaultDescription
serverโœ…URL of DataHub GMS endpoint.
timeout_sec30Per-HTTP request timeout.
tokenBearer token used for authentication.
extra_headersExtra headers which will be added to the request.

DataHub Kafka#

For context on getting started with ingestion, check out our metadata ingestion guide.

Setup#

To install this plugin, run pip install 'acryl-datahub[datahub-kafka]'.

Capabilities#

Pushes metadata to DataHub by publishing messages to Kafka. The advantage of the Kafka-based interface is that it's asynchronous and can handle higher throughput.

Quickstart recipe#

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:  # source configs
sink:  type: "datahub-kafka"  config:    connection:      bootstrap: "localhost:9092"      schema_registry_url: "http://localhost:8081"

Config details#

Note that a . is used to denote nested fields in the YAML recipe.

FieldRequiredDefaultDescription
connection.bootstrapโœ…Kafka bootstrap URL.
connection.producer_config.<option>Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.SerializingProducer
connection.schema_registry_urlโœ…URL of schema registry being used.
connection.schema_registry_config.<option>Passed to https://docs.confluent.io/platform/current/clients/confluent-kafka-python/html/index.html#confluent_kafka.schema_registry.SchemaRegistryClient

The options in the producer config and schema registry config are passed to the Kafka SerializingProducer and SchemaRegistryClient respectively.

For a full example with a number of security options, see this example recipe.

Questions#

If you've got any questions on configuring this sink, feel free to ping us on our Slack!