Skip to main content
Version: Next

Hex

This connector ingests Hex assets into DataHub.

Concept Mapping

Hex ConceptDataHub ConceptNotes
"hex"Data Platform
WorkspaceContainer
ProjectDashboardSubtype Project
ComponentDashboardSubtype Component
CollectionTag

Other Hex concepts are not mapped to DataHub entities yet.

Limitations

Currently, the Hex API has some limitations that affect the completeness of the extracted metadata:

  1. Projects and Components Relationship: The API does not support fetching the many-to-many relationship between Projects and their Components.

  2. Metadata Access: There is no direct method to retrieve metadata for Collections, Status, or Categories. This information is only available indirectly through references within Projects and Components.

Please keep these limitations in mind when working with the Hex connector.Testing

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default
DescriptionsSupported by default
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Extract OwnershipSupported by default
Platform InstanceEnabled by default

Prerequisites

Workspace name

Workspace name is required to fetch the data from Hex. You can find the workspace name in the URL of your Hex home page.

https://app.hex.tech/<workspace_name>"

Eg: In https://app.hex.tech/acryl-partnership, acryl-partnership is the workspace name.

Authentication

To authenticate with Hex, you will need to provide your Hex API Bearer token. You can obtain your API key by following the instructions on the Hex documentation.

Either PAT (Personal Access Token) or Workspace Token can be used as API Bearer token:

  • (Recommended) If Workspace Token, a read-only token would be enough for ingestion.
  • If PAT, ingestion will be done with the user's permissions.

CLI based Ingestion

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: hex
config:
workspace_name: # Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
token: # Your PAT or Workspace token

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
token 
string(password)
Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication
workspace_name 
string
Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
base_url
string
Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex.
categories_as_tags
boolean
Emit Hex Category as tags
Default: True
collections_as_tags
boolean
Emit Hex Collections as tags
Default: True
include_components
boolean
Default: True
page_size
integer
Number of items to fetch per Hex API call.
Default: 100
patch_metadata
boolean
Emit metadata as patch events
Default: False
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
set_ownership_from_email
boolean
Set ownership identity from owner/creator email
Default: True
status_as_tag
boolean
Emit Hex Status as tags
Default: True
env
string
The environment that all assets produced by this connector belong to
Default: PROD
component_title_pattern
AllowDenyPattern
Regex pattern for component titles to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
component_title_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
component_title_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
component_title_pattern.allow.string
string
component_title_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
component_title_pattern.deny.string
string
project_title_pattern
AllowDenyPattern
Regex pattern for project titles to filter in ingestion.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
project_title_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
project_title_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
project_title_pattern.allow.string
string
project_title_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
project_title_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Configuration for stateful ingestion and stale metadata removal.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.hex.hex.HexSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Hex, feel free to ping us on our Slack.