Skip to main content

Looker

There are 2 sources that provide integration with Looker

Source ModuleDocumentation

looker

This plugin extracts the following:

  • Looker dashboards, dashboard elements (charts) and explores
  • Names, descriptions, URLs, chart types, input explores for the charts
  • Schemas and input views for explores
  • Owners of dashboards
note

To get complete Looker metadata integration (including Looker views and lineage to the underlying warehouse tables), you must ALSO use the lookml module.

Read more...

lookml

This plugin extracts the following:

  • LookML views from model files in a project
  • Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
  • If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.
note

To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the looker source module.

Read more...

Module looker

Certified

This plugin extracts the following:

  • Looker dashboards, dashboard elements (charts) and explores
  • Names, descriptions, URLs, chart types, input explores for the charts
  • Schemas and input views for explores
  • Owners of dashboards
note

To get complete Looker metadata integration (including Looker views and lineage to the underlying warehouse tables), you must ALSO use the lookml module.

Install the Plugin

pip install 'acryl-datahub[looker]'

Quickstart Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide

source:
type: "looker"
config:
# Coordinates
base_url: "https://<company>.cloud.looker.com"

# Credentials
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}

# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
envstringThe environment that all assets produced by this connector belong toPROD
platformstringThe platform that this source connects toNone
platform_instancestringThe instance of the platform that all assets produced by this recipe belong toNone
tag_measures_and_dimensionsbooleanWhen enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column.True
platform_namestringDefault platform name. Don't change.looker
client_idstringLooker API client id.None
client_secretstringLooker API client secret.None
base_urlstringUrl to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls.None
include_deletedbooleanWhether to include deleted dashboards.False
extract_ownersbooleanWhen enabled, extracts ownership from Looker directly. When disabled, ownership is left empty for dashboards and charts.True
actorstringThis config is deprecated in favor of extract_owners. Previously, was the actor to use in ownership properties of ingested metadata.None
strip_user_ids_from_emailbooleanWhen enabled, converts Looker user emails of the form name@domain.com to urn:li:corpuser:name when assigning ownershipFalse
skip_personal_foldersbooleanWhether to skip ingestion of dashboards in personal folders. Setting this to True will only ingest dashboards in the Shared folder space.False
max_threadsintegerMax parallelism for Looker API calls. Defaults to cpuCount or 402
external_base_urlstringOptional URL to use when constructing external URLs to Looker if the base_url is not the correct one to use. For example, https://looker-public.company.com. If not provided, the external base URL will default to base_url.None
explore_naming_patternNamingPattern (see below for fields)Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is {model}.explore.{name}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{model}.explore.{name}', 'variables': None}
explore_naming_pattern.allowed_varsArray of stringNone
explore_naming_pattern.patternstringNone
explore_naming_pattern.variablesArray of stringNone
explore_browse_patternNamingPattern (see below for fields){'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/explores/{model}.{name}', 'variables': None}
explore_browse_pattern.allowed_varsArray of stringNone
explore_browse_pattern.patternstringNone
explore_browse_pattern.variablesArray of stringNone
view_naming_patternNamingPattern (see below for fields)Pattern for providing dataset names to views. Allowed variables are {project}, {model}, {name}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{project}.view.{name}', 'variables': None}
view_naming_pattern.allowed_varsArray of stringNone
view_naming_pattern.patternstringNone
view_naming_pattern.variablesArray of stringNone
view_browse_patternNamingPattern (see below for fields)Pattern for providing browse paths to views. Allowed variables are {project}, {model}, {name}, {platform} and {env}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/views/{name}', 'variables': None}
view_browse_pattern.allowed_varsArray of stringNone
view_browse_pattern.patternstringNone
view_browse_pattern.variablesArray of stringNone
github_infoGitHubInfo (see below for fields)Reference to your github location to enable easy navigation from DataHub to your LookML files
github_info.repostringName of your github repo. e.g. repo for https://github.com/datahub-project/datahub is datahub-project/datahub.None
github_info.branchstringBranch on which your files live by default. Typically main or master.main
github_info.base_urlstringBase url for Githubhttps://github.com
transport_optionsTransportOptionsConfig (see below for fields)Populates the TransportOptions struct for looker client
transport_options.timeoutintegerNone
transport_options.headersDict[str,string]
dashboard_patternAllowDenyPattern (see below for fields)Patterns for selecting dashboard ids that are to be included{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
dashboard_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
dashboard_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
dashboard_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
dashboard_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]
chart_patternAllowDenyPattern (see below for fields)Patterns for selecting chart ids that are to be included{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
chart_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
chart_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
chart_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
chart_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]

Configuration Notes

See the Looker authentication docs for the steps to create a client ID and secret. You need to provide the following permissions for ingestion to work correctly.

access_data
explore
manage_models
see_datagroups
see_lookml
see_lookml_dashboards
see_looks
see_pdts
see_queries
see_schedules
see_sql
see_system_activity
see_user_dashboards
see_users

Here is an example permission set after configuration. Looker DataHub Permission Set

Code Coordinates

  • Class Name: datahub.ingestion.source.looker.LookerDashboardSource
  • Browse on GitHub

Module lookml

Certified

This plugin extracts the following:

  • LookML views from model files in a project
  • Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
  • If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.
note

To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the looker source module.

Install the Plugin

pip install 'acryl-datahub[lookml]'

Quickstart Recipe

Check out the following recipe to get started with ingestion! See below for full configuration options.

For general pointers on writing and running a recipe, see our main recipe guide

source:
type: "lookml"
config:
# Coordinates
base_folder: /path/to/model/files

# Options
api:
# Coordinates for your looker instance
base_url: "https://YOUR_INSTANCE.cloud.looker.com"

# Credentials for your Looker connection (https://docs.looker.com/reference/api-and-integration/api-auth)
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}

# Alternative to API section above if you want a purely file-based ingestion with no api calls to Looker or if you want to provide platform_instance ids for your connections
# project_name: PROJECT_NAME # See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what is your project name
# connection_to_platform_map:
# connection_name_1:
# platform: snowflake # bigquery, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# platform_instance: snow_warehouse # optional
# platform_env: PROD # optional
# connection_name_2:
# platform: bigquery # snowflake, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# platform_instance: bq_warehouse # optional
# platform_env: DEV # optional

github_info:
repo: org/repo-name

# sink configs

Config Details

Note that a . is used to denote nested fields in the YAML recipe.

View All Configuration Options
FieldRequiredTypeDescriptionDefault
envstringThe environment that all assets produced by this connector belong toPROD
platformstringThe platform that this source connects toNone
platform_instancestringThe instance of the platform that all assets produced by this recipe belong toNone
tag_measures_and_dimensionsbooleanWhen enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column.True
platform_namestringDefault platform name. Don't change.looker
base_folderstringLocal filepath where the root of the LookML repo lives. This is typically the root folder where the *.model.lkml and *.view.lkml files are stored. e.g. If you have checked out your LookML repo under /Users/jdoe/workspace/my-lookml-repo, then set base_folder to /Users/jdoe/workspace/my-lookml-repo.None
parse_table_names_from_sqlbooleanSee note below.False
sql_parserstringSee note below.datahub.utilities.sql_parser.DefaultSQLParser
project_namestringRequired if you don't specify the api section. The project name within which all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on Develop followed by Manage LookML Projects in the Looker application.None
max_file_snippet_lengthintegerWhen extracting the view definition from a lookml file, the maximum number of characters to extract.512000
explore_naming_patternNamingPattern (see below for fields)Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is {model}.explore.{name}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{model}.explore.{name}', 'variables': None}
explore_naming_pattern.allowed_varsArray of stringNone
explore_naming_pattern.patternstringNone
explore_naming_pattern.variablesArray of stringNone
explore_browse_patternNamingPattern (see below for fields){'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/explores/{model}.{name}', 'variables': None}
explore_browse_pattern.allowed_varsArray of stringNone
explore_browse_pattern.patternstringNone
explore_browse_pattern.variablesArray of stringNone
view_naming_patternNamingPattern (see below for fields)Pattern for providing dataset names to views. Allowed variables are {project}, {model}, {name}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{project}.view.{name}', 'variables': None}
view_naming_pattern.allowed_varsArray of stringNone
view_naming_pattern.patternstringNone
view_naming_pattern.variablesArray of stringNone
view_browse_patternNamingPattern (see below for fields)Pattern for providing browse paths to views. Allowed variables are {project}, {model}, {name}, {platform} and {env}{'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/views/{name}', 'variables': None}
view_browse_pattern.allowed_varsArray of stringNone
view_browse_pattern.patternstringNone
view_browse_pattern.variablesArray of stringNone
github_infoGitHubInfo (see below for fields)Reference to your github location to enable easy navigation from DataHub to your LookML files
github_info.repostringName of your github repo. e.g. repo for https://github.com/datahub-project/datahub is datahub-project/datahub.None
github_info.branchstringBranch on which your files live by default. Typically main or master.main
github_info.base_urlstringBase url for Githubhttps://github.com
connection_to_platform_mapDict[str, LookerConnectionDefinition]A mapping of Looker connection names to DataHub platform, database, and schema values.
connection_to_platform_map.key.platformstringNone
connection_to_platform_map.key.default_dbstringNone
connection_to_platform_map.key.default_schemastringNone
connection_to_platform_map.key.platform_instancestringNone
connection_to_platform_map.key.platform_envstringThe environment that the platform is located in. Leaving this empty will inherit defaults from the top level Looker configurationNone
model_patternAllowDenyPattern (see below for fields)List of regex patterns for LookML models to include in the extraction.{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
model_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
model_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
model_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
model_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]
view_patternAllowDenyPattern (see below for fields)List of regex patterns for LookML views to include in the extraction.{'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'}
view_pattern.allowArray of stringList of regex patterns for process groups to include in ingestion['.*']
view_pattern.denyArray of stringList of regex patterns for process groups to exclude from ingestion.[]
view_pattern.ignoreCasebooleanWhether to ignore case sensitivity during pattern matching.True
view_pattern.alphabetstringAllowed alphabets pattern[A-Za-z0-9 _.-]
apiLookerAPIConfig (see below for fields)
api.client_idstringLooker API client id.None
api.client_secretstringLooker API client secret.None
api.base_urlstringUrl to your Looker instance: https://company.looker.com:19999 or https://looker.company.com, or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls.None
api.transport_optionsTransportOptionsConfig (see below for fields)Populates the TransportOptions struct for looker client
api.transport_options.timeoutintegerNone
api.transport_options.headersDict[str,string]
transport_optionsTransportOptionsConfig (see below for fields)Populates the TransportOptions struct for looker client
transport_options.timeoutintegerNone
transport_options.headersDict[str,string]

Configuration Notes

See the Looker authentication docs for the steps to create a client ID and secret. You need to ensure that the API key is attached to a user that has Admin privileges. If that is not possible, read the configuration section to provide an offline specification of the connection_to_platform_map and the project_name.

note

The integration can use an SQL parser to try to parse the tables the views depends on.

This parsing is disabled by default, but can be enabled by setting parse_table_names_from_sql: True. The default parser is based on the sqllineage package. As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a custom parser and take it into use by setting the sql_parser configuration value. A custom SQL parser must inherit from datahub.utilities.sql_parser.SQLParser and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to module_name.ClassName of the parser.

Code Coordinates

  • Class Name: datahub.ingestion.source.lookml.LookMLSource
  • Browse on GitHub

Questions

If you've got any questions on configuring ingestion for Looker, feel free to ping us on our Slack