Skip to main content

SageMaker

Module sagemaker

Certified

Important Capabilities

CapabilityStatusNotes
Table-Level LineageEnabled by default

This plugin extracts the following:

  • Feature groups
  • Models, jobs, and lineage between the two (e.g. when jobs output a model or a model is used by a job)

Install the Plugin

pip install 'acryl-datahub[sagemaker]'

Quickstart Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide

source:
type: sagemaker
config:
# Coordinates
aws_region: "my-aws-region"

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
aws_access_key_idstringAutodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.htmlNone
aws_secret_access_keystringAutodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.htmlNone
aws_session_tokenstringAutodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.htmlNone
aws_roleGeneric dictAutodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.htmlNone
aws_profilestringNamed AWS profile to use, if not set the default will be usedNone
aws_regionstringAWS region code.None
aws_endpoint_urlstringAutodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.htmlNone
aws_proxyDict[str,string]Autodetected. See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html
envstringThe environment that all assets produced by this connector belong toPROD
extract_feature_groupsbooleanWhether to extract feature groups.True
extract_modelsbooleanWhether to extract models.True
extract_jobsGeneric dictWhether to extract AutoML jobs.True
database_patternAllowDenyPattern (see below for fields)regex patterns for databases to filter in ingestion.{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
database_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
database_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
database_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
database_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]
table_patternAllowDenyPattern (see below for fields)regex patterns for tables to filter in ingestion.{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
table_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
table_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
table_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
table_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]

Code Coordinates

  • Class Name: datahub.ingestion.source.aws.sagemaker.SagemakerSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for SageMaker, feel free to ping us on our Slack