Skip to main content

Tableau

Module tableau

Incubating

Important Capabilities

CapabilityStatusNotes
Data Profiling
Dataset Usage
DescriptionsEnabled by default
Detect Deleted Entities
DomainsRequires transformer
Extract OwnershipRequires recipe configuration
Extract TagsRequires recipe configuration
Partition SupportNot applicable to source
Platform InstanceNot applicable to source
Table-Level LineageEnabled by default

Install the Plugin

pip install 'acryl-datahub[tableau]'

Quickstart Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide

source:
type: tableau
config:
# Coordinates
connect_uri: https://prod-ca-a.online.tableau.com
site: acryl
projects: ["default", "Project 2"]

# Credentials
username: "${TABLEAU_USER}"
password: "${TABLEAU_PASSWORD}"

# Options
ingest_tags: True
ingest_owner: True
default_schema_map:
mydatabase: public
anotherdatabase: anotherschema

sink:
# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
connect_uristringTableau host URL.None
usernamestringTableau username, must be set if authenticating using username/password.None
passwordstringTableau password, must be set if authenticating using username/password.None
token_namestringTableau token name, must be set if authenticating using a personal access token.None
token_valuestringTableau token value, must be set if authenticating using a personal access token.None
sitestringTableau Site. Always required for Tableau Online. Use emptystring to connect with Default site on Tableau Server.
projectsArray of stringList of projects['default']
default_schema_mapDictDefault schema to use when schema is not found.{}
ingest_tagsbooleanIngest Tags from source. This will override Tags entered from UIFalse
ingest_ownerbooleanIngest Owner from source. This will override Owner info entered from UIFalse
ingest_tables_externalbooleanIngest details for tables external to (not embedded in) tableau as entities.False
workbooks_page_sizeinteger@deprecated(use page_size instead) Number of workbooks to query at a time using Tableau api.None
page_sizeintegerNumber of metadata objects (e.g. CustomSQLTable, PublishedDatasource, etc) to query at a time using Tableau api.10
envstringEnvironment to use in namespace when constructing URNs.PROD

Prerequisites

In order to ingest metadata from tableau, you will need:

Integration Details

This plugin extracts Sheets, Dashboards, Embedded and Published Data sources metadata within Workbooks in a given project on a Tableau site. This plugin is in beta and has only been tested on PostgreSQL database and sample workbooks on Tableau online. Tableau's GraphQL interface is used to extract metadata information. Queries used to extract metadata are located in metadata-ingestion/src/datahub/ingestion/source/tableau_common.py

Concept Mapping

This ingestion source maps the following Source System Concepts to DataHub Concepts:

Source ConceptDataHub ConceptNotes
"Tableau"Data Platform
Embedded DataSourceDatasetSubType "Embedded Data Source"
Published DataSourceDatasetSubType "Published Data Source"
Custom SQL TableDatasetSubTypes "View", "Custom SQL"
Embedded or External TablesDataset
SheetChart
DashboardDashboard
UserUser (a.k.a CorpUser)
WorkbookContainerSubType "Workbook"
TagTag

Workbook

Workbooks from Tableau are ingested as Container in datahub.

  • GraphQL query
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
id
name
luid
uri
projectName
owner {
username
}
description
uri
createdAt
updatedAt
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}

Dashboard

Dashboards from Tableau are ingested as Dashboard in datahub.

  • GraphQL query
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default", "Project 2"]}) {
nodes {
.....
dashboards {
id
name
path
createdAt
updatedAt
sheets {
id
name
}
}
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}

Sheet

Sheets from Tableau are ingested as charts in datahub.

  • GraphQL query
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default"]}) {
.....
sheets {
id
name
path
createdAt
updatedAt
tags {
name
}
containedInDashboards {
name
path
}
upstreamDatasources {
id
name
}
datasourceFields {
__typename
id
name
description
upstreamColumns {
name
}
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
... on DatasourceField {
remoteField {
__typename
id
name
description
folderName
... on ColumnField {
dataCategory
role
dataType
aggregation
}
... on CalculatedField {
role
dataType
aggregation
formula
}
... on GroupField {
role
dataType
}
}
}
}
}
}
.....
}
}

Embedded Data Source

Embedded Data source from Tableau is ingested as a Dataset in datahub.

  • GraphQL query
{
workbooksConnection(first: 10, offset: 0, filter: {projectNameWithin: ["default"]}) {
nodes {
....
embeddedDatasources {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
upstreamDatabases {
id
name
connectionType
isEmbedded
}
upstreamTables {
name
schema
columns {
name
remoteType
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
upstreamDatasources {
id
name
}
workbook {
name
projectName
}
}
}
....
}
}

Published Data Source

Published Data source from Tableau is ingested as a Dataset in datahub.

  • GraphQL query
{
publishedDatasourcesConnection(first: 10, offset: 0, filter: {idWithin: ["00cce29f-b561-bb41-3557-8e19660bb5dd", "618c87db-5959-338b-bcc7-6f5f4cc0b6c6"]}) {
nodes {
__typename
id
name
hasExtracts
extractLastRefreshTime
extractLastIncrementalUpdateTime
extractLastUpdateTime
downstreamSheets {
id
name
}
upstreamTables {
name
schema
fullName
connectionType
description
contact {
name
}
}
fields {
__typename
id
name
description
isHidden
folderName
... on ColumnField {
dataCategory
role
dataType
defaultFormat
aggregation
columns {
table {
... on CustomSQLTable {
id
name
}
}
}
}
... on CalculatedField {
role
dataType
defaultFormat
aggregation
formula
}
... on GroupField {
role
dataType
}
}
owner {
username
}
description
uri
projectName
}
pageInfo {
hasNextPage
endCursor
}
totalCount
}
}

Custom SQL Data Source

For custom sql data sources, the query is viewable in UI under View Definition tab.

  • GraphQL query
{
customSQLTablesConnection(first: 10, offset: 0, filter: {idWithin: ["22b0b4c3-6b85-713d-a161-5a87fdd78f40"]}) {
nodes {
id
name
query
columns {
id
name
remoteType
description
referencedByFields {
datasource {
id
name
upstreamDatabases {
id
name
}
upstreamTables {
id
name
schema
connectionType
columns {
id
}
}
... on PublishedDatasource {
projectName
}
... on EmbeddedDatasource {
workbook {
name
projectName
}
}
}
}
}
tables {
id
name
schema
connectionType
}
}
}
}

Lineage

Lineage is emitted as received from Tableau's metadata API for

  • Sheets contained in Dashboard
  • Embedded or Published datasources upstream to Sheet
  • Published datasources upstream to Embedded datasource
  • Tables upstream to Embedded or Published datasource
  • Custom SQL datasources upstream to Embedded or Published datasource
  • Tables upstream to Custom SQL datasource

Caveats

  • Tableau metadata API might return incorrect schema name for tables for some databases, leading to incorrect metadata in DataHub. This source attempts to extract correct schema from databaseTable's fully qualified name, wherever possible. Read Using the databaseTable object in query for caveats in using schema attribute.

Troubleshooting

Why are only some workbooks/custom SQLs/published datasources ingested from the specified project?

This may happen when the Tableau API returns NODE_LIMIT_EXCEEDED error in response to metadata query and returns partial results with message "Showing partial results. , The request exceeded the ‘n’ node limit. Use pagination, additional filtering, or both in the query to adjust results." To resolve this, consider

  • reducing the page size using the page_size config param in datahub recipe (Defaults to 10).
  • increasing tableau configuration metadata query node limit to higher value.

Code Coordinates

  • Class Name: datahub.ingestion.source.tableau.TableauSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Tableau, feel free to ping us on our Slack