Skip to main content
Version: Next

AI Glossary Term Suggestions

Feature Availability
Self-Hosted DataHub
DataHub Cloud
info

This feature is currently in closed beta. Reach out to your Acryl representative to get access.

The AI Glossary Term Suggestion automation uses LLMs to suggest Glossary Terms for tables and columns in your data.

This is useful for improving coverage of glossary terms across your organization, which is important for compliance and governance efforts.

This automation can:

  • Automatically suggests glossary terms for tables and columns.
  • Goes beyond a predefined set of terms and works with your business glossary.
  • Generates proposals for owners to review, or can automatically add terms to tables/columns.
  • Automatically adjusts to human-provided feedback and curation (coming soon).

Prerequisites

  • A business glossary with terms defined. Additional metadata, like documentation and existing term assignments, will improve the accuracy of our suggestions.

Configuring

  1. Navigate to Automations: Click on 'Govern' > 'Automations' in the navigation bar.

  2. Create the Automation: Click on 'Create' and select 'AI Glossary Term Suggestions'.

  3. Configure the Automation: Fill in the required fields to configure the automation. The main fields to configure are (1) what terms to use for suggestions and (2) what entities to generate suggestions for.

  4. Once it's enabled, that's it! You'll start to see terms show up in the UI, either on assets or in the proposals page.

How it works

The automation will scan through all the datasets matched by the configured filters. For each one, it will generate suggestions. If new entities are added that match the configured filters, those will also be classified within 24 hours.

We take into account the following metadata when generating suggestions:

  • Dataset name and description
  • Column name, type, description, and sample values
  • Glossary term name, documentation, and hierarchy
  • Feedback loop: existing assignments and accepted/rejected proposals (coming soon)

Data privacy: Your metadata is not sent to any third-party LLMs. We use AWS Bedrock internally, which means all metadata remains within the Acryl AWS account. We do not fine-tune on customer data.

Limitations

  • A single configured automation can classify at most 10k entities.
  • We cannot do partial reclassification. If you add a new column to an existing table, we won't regenerate suggestions for that table.