Skip to main content


For context on getting started with ingestion, check out our metadata ingestion guide.


To install this plugin, run pip install 'acryl-datahub[glue]'.

Note: if you also have files in S3 that you'd like to ingest, we recommend you use Glue's built-in data catalog. See here for a quick guide on how to set up a crawler on Glue and ingest the outputs with DataHub.


This plugin extracts the following:

  • Tables in the Glue catalog
  • Column types associated with each table
  • Table metadata, such as owner, description and parameters
  • Jobs and their component transformations, data sources, and data sinks

Quickstart recipe#

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:  type: glue  config:    # Coordinates    aws_region: "my-aws-region"
sink:  # sink configs

Config details#

Note that a . is used to denote nested fields in the YAML recipe.

aws_regionโœ…AWS region code.
env"PROD"Environment to use in namespace when constructing URNs.
extract_transformsTrueWhether to extract Glue transform jobs.
database_pattern.allowList of regex patterns for databases to include in ingestion.
database_pattern.denyList of regex patterns for databases to exclude from ingestion.
database_pattern.ignoreCaseTrueWhether to ignore case sensitivity during pattern matching.
table_pattern.allowList of regex patterns for tables to include in ingestion.
table_pattern.denyList of regex patterns for tables to exclude from ingestion.
table_pattern.ignoreCaseTrueWhether to ignore case sensitivity during pattern matching.
underlying_platformglueOverride for platform name. Allowed values - glue, athena


Coming soon!


If you've got any questions on configuring this source, feel free to ping us on our Slack!