Hex
This connector ingests Hex assets into DataHub.
Concept Mapping
Hex Concept | DataHub Concept | Notes |
---|---|---|
"hex" | Data Platform | |
Workspace | Container | |
Project | Dashboard | Subtype Project |
Component | Dashboard | Subtype Component |
Collection | Tag |
Other Hex concepts are not mapped to DataHub entities yet.
Limitations
Currently, the Hex API has some limitations that affect the completeness of the extracted metadata:
Projects and Components Relationship: The API does not support fetching the many-to-many relationship between Projects and their Components.
Metadata Access: There is no direct method to retrieve metadata for Collections, Status, or Categories. This information is only available indirectly through references within Projects and Components.
Please keep these limitations in mind when working with the Hex connector.
Important Capabilities
Capability | Status | Notes |
---|---|---|
Asset Containers | ✅ | Enabled by default |
Descriptions | ✅ | Supported by default |
Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion.remove_stale_metadata |
Extract Ownership | ✅ | Supported by default |
Platform Instance | ✅ | Enabled by default |
Prerequisites
Workspace name
Workspace name is required to fetch the data from Hex. You can find the workspace name in the URL of your Hex home page.
https://app.hex.tech/<workspace_name>"
Eg: In https://app.hex.tech/acryl-partnership, acryl-partnership
is the workspace name.
Authentication
To authenticate with Hex, you will need to provide your Hex API Bearer token. You can obtain your API key by following the instructions on the Hex documentation.
Either PAT (Personal Access Token) or Workspace Token can be used as API Bearer token:
- (Recommended) If Workspace Token, a read-only token would be enough for ingestion.
- If PAT, ingestion will be done with the user's permissions.
CLI based Ingestion
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
type: hex
config:
workspace_name: # Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>
token: # Your PAT or Workspace token
sink:
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
Field | Description |
---|---|
token ✅ string(password) | Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication |
workspace_name ✅ string | Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name> |
base_url string | Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex. Default: https://app.hex.tech/api/v1 |
categories_as_tags boolean | Emit Hex Category as tags Default: True |
collections_as_tags boolean | Emit Hex Collections as tags Default: True |
include_components boolean | Default: True |
page_size integer | Number of items to fetch per Hex API call. Default: 100 |
patch_metadata boolean | Emit metadata as patch events Default: False |
platform_instance string | The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details. |
set_ownership_from_email boolean | Set ownership identity from owner/creator email Default: True |
status_as_tag boolean | Emit Hex Status as tags Default: True |
env string | The environment that all assets produced by this connector belong to Default: PROD |
component_title_pattern AllowDenyPattern | Regex pattern for component titles to filter in ingestion. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
component_title_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
component_title_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
component_title_pattern.allow.string string | |
component_title_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
component_title_pattern.deny.string string | |
project_title_pattern AllowDenyPattern | Regex pattern for project titles to filter in ingestion. Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} |
project_title_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True |
project_title_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] |
project_title_pattern.allow.string string | |
project_title_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] |
project_title_pattern.deny.string string | |
stateful_ingestion StatefulStaleMetadataRemovalConfig | Configuration for stateful ingestion and stale metadata removal. |
stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False Default: False |
stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True |
The JSONSchema for this configuration is inlined below.
{
"title": "HexSourceConfig",
"description": "Base configuration class for stateful ingestion for source configs to inherit from.",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.",
"type": "string"
},
"stateful_ingestion": {
"title": "Stateful Ingestion",
"description": "Configuration for stateful ingestion and stale metadata removal.",
"allOf": [
{
"$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
}
]
},
"workspace_name": {
"title": "Workspace Name",
"description": "Hex workspace name. You can find this name in your Hex home page URL: https://app.hex.tech/<workspace_name>",
"type": "string"
},
"token": {
"title": "Token",
"description": "Hex API token; either PAT or Workflow token - https://learn.hex.tech/docs/api/api-overview#authentication",
"type": "string",
"writeOnly": true,
"format": "password"
},
"base_url": {
"title": "Base Url",
"description": "Hex API base URL. For most Hex users, this will be https://app.hex.tech/api/v1. Single-tenant app users should replace this with the URL they use to access Hex.",
"default": "https://app.hex.tech/api/v1",
"type": "string"
},
"include_components": {
"title": "Include Components",
"default": true,
"desciption": "Include Hex Components in the ingestion",
"type": "boolean"
},
"page_size": {
"title": "Page Size",
"description": "Number of items to fetch per Hex API call.",
"default": 100,
"type": "integer"
},
"patch_metadata": {
"title": "Patch Metadata",
"description": "Emit metadata as patch events",
"default": false,
"type": "boolean"
},
"collections_as_tags": {
"title": "Collections As Tags",
"description": "Emit Hex Collections as tags",
"default": true,
"type": "boolean"
},
"status_as_tag": {
"title": "Status As Tag",
"description": "Emit Hex Status as tags",
"default": true,
"type": "boolean"
},
"categories_as_tags": {
"title": "Categories As Tags",
"description": "Emit Hex Category as tags",
"default": true,
"type": "boolean"
},
"project_title_pattern": {
"title": "Project Title Pattern",
"description": "Regex pattern for project titles to filter in ingestion.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"component_title_pattern": {
"title": "Component Title Pattern",
"description": "Regex pattern for component titles to filter in ingestion.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"set_ownership_from_email": {
"title": "Set Ownership From Email",
"description": "Set ownership identity from owner/creator email",
"default": true,
"type": "boolean"
}
},
"required": [
"workspace_name",
"token"
],
"additionalProperties": false,
"definitions": {
"DynamicTypedStateProviderConfig": {
"title": "DynamicTypedStateProviderConfig",
"type": "object",
"properties": {
"type": {
"title": "Type",
"description": "The type of the state provider to use. For DataHub use `datahub`",
"type": "string"
},
"config": {
"title": "Config",
"description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
"default": {},
"type": "object"
}
},
"required": [
"type"
],
"additionalProperties": false
},
"StatefulStaleMetadataRemovalConfig": {
"title": "StatefulStaleMetadataRemovalConfig",
"description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
"type": "object",
"properties": {
"enabled": {
"title": "Enabled",
"description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
"default": false,
"type": "boolean"
},
"remove_stale_metadata": {
"title": "Remove Stale Metadata",
"description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
}
},
"additionalProperties": false
}
}
}
Code Coordinates
- Class Name:
datahub.ingestion.source.hex.hex.HexSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Hex, feel free to ping us on our Slack.