Skip to main content
Version: 0.14.1

Sigma

Incubating

Important Capabilities

CapabilityStatusNotes
Asset ContainersEnabled by default
DescriptionsEnabled by default
Detect Deleted EntitiesOptionally enabled via stateful_ingestion.remove_stale_metadata
Extract OwnershipEnabled by default, configured using ingest_owner
Extract TagsEnabled by default
Platform InstanceEnabled by default
Schema MetadataEnabled by default
Table-Level LineageEnabled by default.

This plugin extracts the following:

  • Sigma Workspaces and Workbooks as Container.
  • Sigma Datasets
  • Pages as Dashboard and its Elements as Charts

Integration Details

This source extracts the following:

  • Workspaces and workbooks within that workspaces as Container.
  • Sigma Datasets as Datahub Datasets.
  • Pages as Datahub dashboards and elements present inside pages as charts.

Configuration Notes

  1. Refer doc to generate an API client credentials.
  2. Provide the generated Client ID and Secret in Recipe.

Concept mapping

SigmaDatahubNotes
WorkspaceContainerSubType "Sigma Workspace"
WorkbookContainerSubType "Sigma Workbook"
PageDashboard
ElementChart
DatasetDatasetSubType "Sigma Dataset"
UserUser (a.k.a CorpUser)Optionally Extracted

Advanced Configurations

Chart source platform mapping

If you want to provide platform details(platform name, platform instance and env) for chart's all external upstream data sources, then you can use chart_sources_platform_mapping as below:

Example - For just one specific chart's external upstream data sources

    chart_sources_platform_mapping:
'workspace_name/workbook_name/chart_name_1':
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

'workspace_name/folder_name/workbook_name/chart_name_2':
data_source_platform: postgres
platform_instance: cloud_instance
env: DEV

Example - For all charts within one specific workbook

    chart_sources_platform_mapping:
'workspace_name/workbook_name_1':
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

'workspace_name/folder_name/workbook_name_2':
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

Example - For all workbooks charts within one specific workspace

    chart_sources_platform_mapping:
'workspace_name':
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

Example - All workbooks use the same connection

    chart_sources_platform_mapping:
'*':
data_source_platform: snowflake
platform_instance: new_instance
env: PROD

CLI based Ingestion

Install the Plugin

The sigma source works out of the box with acryl-datahub.

Starter Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide.

source:
type: sigma
config:
# Coordinates
api_url: "https://aws-api.sigmacomputing.com/v2"
# Credentials
client_id: "CLIENTID"
client_secret: "CLIENT_SECRET"

# Optional - filter for certain workspace names instead of ingesting everything.
# workspace_pattern:
# allow:
# - workspace_name

ingest_owner: true

# Optional - mapping of sigma workspace/workbook/chart folder path to all chart's data sources platform details present inside that folder path.
# chart_sources_platform_mapping:
# folder_path:
# data_source_platform: postgres
# platform_instance: cloud_instance
# env: DEV

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

FieldDescription
client_id 
string
Sigma Client ID
client_secret 
string
Sigma Client Secret
api_url
string
Sigma API hosted URL.
extract_lineage
boolean
Whether to extract lineage of workbook's elements and datasets or not.
Default: True
ingest_owner
boolean
Ingest Owner from source. This will override Owner info entered from UI.
Default: True
ingest_shared_entities
boolean
Whether to ingest the shared entities or not.
Default: False
platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
env
string
The environment that all assets produced by this connector belong to
Default: PROD
chart_sources_platform_mapping
map(str,PlatformDetail)
Any source that connects to a platform should inherit this class
chart_sources_platform_mapping.key.env
string
The environment that all assets produced by this connector belong to
Default: PROD
chart_sources_platform_mapping.key.data_source_platform 
string
A chart's data sources platform name.
chart_sources_platform_mapping.key.platform_instance
string
The instance of the platform that all assets produced by this recipe belong to. This should be unique within the platform. See https://datahubproject.io/docs/platform-instances/ for more details.
workbook_lineage_pattern
AllowDenyPattern
Regex patterns to filter workbook's elements and datasets lineage in ingestion.Requires extract_lineage to be enabled.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
workbook_lineage_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
workbook_lineage_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
workbook_lineage_pattern.allow.string
string
workbook_lineage_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
workbook_lineage_pattern.deny.string
string
workspace_pattern
AllowDenyPattern
Regex patterns to filter Sigma workspaces in ingestion.Mention 'My documents' if personal entities also need to ingest.
Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True}
workspace_pattern.ignoreCase
boolean
Whether to ignore case sensitivity during pattern matching.
Default: True
workspace_pattern.allow
array
List of regex patterns to include in ingestion
Default: ['.*']
workspace_pattern.allow.string
string
workspace_pattern.deny
array
List of regex patterns to exclude from ingestion.
Default: []
workspace_pattern.deny.string
string
stateful_ingestion
StatefulStaleMetadataRemovalConfig
Sigma Stateful Ingestion Config.
stateful_ingestion.enabled
boolean
Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_api is specified, otherwise False
Default: False
stateful_ingestion.remove_stale_metadata
boolean
Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.
Default: True

Code Coordinates

  • Class Name: datahub.ingestion.source.sigma.sigma.SigmaSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Sigma, feel free to ping us on our Slack.