Looker
There are 2 sources that provide integration with Looker
Source Module | Documentation |
| This plugin extracts the following:
noteTo get complete Looker metadata integration (including Looker views and lineage to the underlying warehouse tables), you must ALSO use the |
| This plugin extracts the following:
noteTo get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the |
Module looker
This plugin extracts the following:
- Looker dashboards, dashboard elements (charts) and explores
- Names, descriptions, URLs, chart types, input explores for the charts
- Schemas and input views for explores
- Owners of dashboards
note
To get complete Looker metadata integration (including Looker views and lineage to the underlying warehouse tables), you must ALSO use the lookml
module.
Install the Plugin
pip install 'acryl-datahub[looker]'
Quickstart Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide
source:
type: "looker"
config:
# Coordinates
base_url: "https://<company>.cloud.looker.com"
# Credentials
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
View All Configuration Options
Field | Required | Type | Description | Default |
---|---|---|---|---|
env | string | The environment that all assets produced by this connector belong to | PROD | |
platform | string | The platform that this source connects to | None | |
platform_instance | string | The instance of the platform that all assets produced by this recipe belong to | None | |
tag_measures_and_dimensions | boolean | When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column. | True | |
platform_name | string | Default platform name. Don't change. | looker | |
client_id | ✅ | string | Looker API client id. | None |
client_secret | ✅ | string | Looker API client secret. | None |
base_url | ✅ | string | Url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com , or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls. | None |
include_deleted | boolean | Whether to include deleted dashboards. | False | |
extract_owners | boolean | When enabled, extracts ownership from Looker directly. When disabled, ownership is left empty for dashboards and charts. | True | |
actor | string | This config is deprecated in favor of extract_owners . Previously, was the actor to use in ownership properties of ingested metadata. | None | |
strip_user_ids_from_email | boolean | When enabled, converts Looker user emails of the form name@domain.com to urn:li:corpuser:name when assigning ownership | False | |
skip_personal_folders | boolean | Whether to skip ingestion of dashboards in personal folders. Setting this to True will only ingest dashboards in the Shared folder space. | False | |
max_threads | integer | Max parallelism for Looker API calls. Defaults to cpuCount or 40 | 2 | |
external_base_url | string | Optional URL to use when constructing external URLs to Looker if the base_url is not the correct one to use. For example, https://looker-public.company.com . If not provided, the external base URL will default to base_url . | None | |
explore_naming_pattern | NamingPattern (see below for fields) | Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is {model}.explore.{name} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{model}.explore.{name}', 'variables': None} | |
explore_naming_pattern.allowed_vars | ✅ | Array of string | None | |
explore_naming_pattern.pattern | ✅ | string | None | |
explore_naming_pattern.variables | Array of string | None | ||
explore_browse_pattern | NamingPattern (see below for fields) | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/explores/{model}.{name}', 'variables': None} | ||
explore_browse_pattern.allowed_vars | ✅ | Array of string | None | |
explore_browse_pattern.pattern | ✅ | string | None | |
explore_browse_pattern.variables | Array of string | None | ||
view_naming_pattern | NamingPattern (see below for fields) | Pattern for providing dataset names to views. Allowed variables are {project} , {model} , {name} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{project}.view.{name}', 'variables': None} | |
view_naming_pattern.allowed_vars | ✅ | Array of string | None | |
view_naming_pattern.pattern | ✅ | string | None | |
view_naming_pattern.variables | Array of string | None | ||
view_browse_pattern | NamingPattern (see below for fields) | Pattern for providing browse paths to views. Allowed variables are {project} , {model} , {name} , {platform} and {env} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/views/{name}', 'variables': None} | |
view_browse_pattern.allowed_vars | ✅ | Array of string | None | |
view_browse_pattern.pattern | ✅ | string | None | |
view_browse_pattern.variables | Array of string | None | ||
github_info | GitHubInfo (see below for fields) | Reference to your github location to enable easy navigation from DataHub to your LookML files | ||
github_info.repo | ✅ | string | Name of your github repo. e.g. repo for https://github.com/datahub-project/datahub is datahub-project/datahub . | None |
github_info.branch | string | Branch on which your files live by default. Typically main or master. | main | |
github_info.base_url | string | Base url for Github | https://github.com | |
transport_options | TransportOptionsConfig (see below for fields) | Populates the TransportOptions struct for looker client | ||
transport_options.timeout | ✅ | integer | None | |
transport_options.headers | ✅ | Dict[str,string] | ||
dashboard_pattern | AllowDenyPattern (see below for fields) | Patterns for selecting dashboard ids that are to be included | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
dashboard_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
dashboard_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
dashboard_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
dashboard_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] | |
chart_pattern | AllowDenyPattern (see below for fields) | Patterns for selecting chart ids that are to be included | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
chart_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
chart_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
chart_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
chart_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] |
The JSONSchema for this configuration is inlined below.
{
"title": "LookerDashboardSourceConfig",
"description": "Any source that is a primary producer of Dataset metadata should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform": {
"title": "Platform",
"description": "The platform that this source connects to",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to",
"type": "string"
},
"explore_naming_pattern": {
"title": "Explore Naming Pattern",
"description": "Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is `{model}.explore.{name}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "{model}.explore.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"explore_browse_pattern": {
"title": "Explore Browse Pattern",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "/{env}/{platform}/{project}/explores/{model}.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"view_naming_pattern": {
"title": "View Naming Pattern",
"description": "Pattern for providing dataset names to views. Allowed variables are `{project}`, `{model}`, `{name}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "{project}.view.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"view_browse_pattern": {
"title": "View Browse Pattern",
"description": "Pattern for providing browse paths to views. Allowed variables are `{project}`, `{model}`, `{name}`, `{platform}` and `{env}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "/{env}/{platform}/{project}/views/{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"tag_measures_and_dimensions": {
"title": "Tag Measures And Dimensions",
"description": "When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column.",
"default": true,
"type": "boolean"
},
"platform_name": {
"title": "Platform Name",
"description": "Default platform name. Don't change.",
"default": "looker",
"type": "string"
},
"github_info": {
"title": "Github Info",
"description": "Reference to your github location to enable easy navigation from DataHub to your LookML files",
"allOf": [
{
"$ref": "#/definitions/GitHubInfo"
}
]
},
"client_id": {
"title": "Client Id",
"description": "Looker API client id.",
"type": "string"
},
"client_secret": {
"title": "Client Secret",
"description": "Looker API client secret.",
"type": "string"
},
"base_url": {
"title": "Base Url",
"description": "Url to your Looker instance: `https://company.looker.com:19999` or `https://looker.company.com`, or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls.",
"type": "string"
},
"transport_options": {
"title": "Transport Options",
"description": "Populates the [TransportOptions](https://github.com/looker-open-source/sdk-codegen/blob/94d6047a0d52912ac082eb91616c1e7c379ab262/python/looker_sdk/rtl/transport.py#L70) struct for looker client",
"allOf": [
{
"$ref": "#/definitions/TransportOptionsConfig"
}
]
},
"dashboard_pattern": {
"title": "Dashboard Pattern",
"description": "Patterns for selecting dashboard ids that are to be included",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"chart_pattern": {
"title": "Chart Pattern",
"description": "Patterns for selecting chart ids that are to be included",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"include_deleted": {
"title": "Include Deleted",
"description": "Whether to include deleted dashboards.",
"default": false,
"type": "boolean"
},
"extract_owners": {
"title": "Extract Owners",
"description": "When enabled, extracts ownership from Looker directly. When disabled, ownership is left empty for dashboards and charts.",
"default": true,
"type": "boolean"
},
"actor": {
"title": "Actor",
"description": "This config is deprecated in favor of `extract_owners`. Previously, was the actor to use in ownership properties of ingested metadata.",
"type": "string"
},
"strip_user_ids_from_email": {
"title": "Strip User Ids From Email",
"description": "When enabled, converts Looker user emails of the form name@domain.com to urn:li:corpuser:name when assigning ownership",
"default": false,
"type": "boolean"
},
"skip_personal_folders": {
"title": "Skip Personal Folders",
"description": "Whether to skip ingestion of dashboards in personal folders. Setting this to True will only ingest dashboards in the Shared folder space.",
"default": false,
"type": "boolean"
},
"max_threads": {
"title": "Max Threads",
"description": "Max parallelism for Looker API calls. Defaults to cpuCount or 40",
"default": 2,
"type": "integer"
},
"external_base_url": {
"title": "External Base Url",
"description": "Optional URL to use when constructing external URLs to Looker if the `base_url` is not the correct one to use. For example, `https://looker-public.company.com`. If not provided, the external base URL will default to `base_url`.",
"type": "string"
}
},
"required": [
"client_id",
"client_secret",
"base_url"
],
"additionalProperties": false,
"definitions": {
"NamingPattern": {
"title": "NamingPattern",
"type": "object",
"properties": {
"allowed_vars": {
"title": "Allowed Vars",
"type": "array",
"items": {
"type": "string"
}
},
"pattern": {
"title": "Pattern",
"type": "string"
},
"variables": {
"title": "Variables",
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"allowed_vars",
"pattern"
]
},
"GitHubInfo": {
"title": "GitHubInfo",
"type": "object",
"properties": {
"repo": {
"title": "Repo",
"description": "Name of your github repo. e.g. repo for https://github.com/datahub-project/datahub is `datahub-project/datahub`.",
"type": "string"
},
"branch": {
"title": "Branch",
"description": "Branch on which your files live by default. Typically main or master.",
"default": "main",
"type": "string"
},
"base_url": {
"title": "Base Url",
"description": "Base url for Github",
"default": "https://github.com",
"type": "string"
}
},
"required": [
"repo"
],
"additionalProperties": false
},
"TransportOptionsConfig": {
"title": "TransportOptionsConfig",
"type": "object",
"properties": {
"timeout": {
"title": "Timeout",
"type": "integer"
},
"headers": {
"title": "Headers",
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"required": [
"timeout",
"headers"
],
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns for process groups to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns for process groups to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
},
"alphabet": {
"title": "Alphabet",
"description": "Allowed alphabets pattern",
"default": "[A-Za-z0-9 _.-]",
"type": "string"
}
},
"additionalProperties": false
}
}
}
Configuration Notes
See the Looker authentication docs for the steps to create a client ID and secret. You need to provide the following permissions for ingestion to work correctly.
access_data
explore
manage_models
see_datagroups
see_lookml
see_lookml_dashboards
see_looks
see_pdts
see_queries
see_schedules
see_sql
see_system_activity
see_user_dashboards
see_users
Here is an example permission set after configuration.
Code Coordinates
- Class Name:
datahub.ingestion.source.looker.LookerDashboardSource
- Browse on GitHub
Module lookml
This plugin extracts the following:
- LookML views from model files in a project
- Name, upstream table names, metadata for dimensions, measures, and dimension groups attached as tags
- If API integration is enabled (recommended), resolves table and view names by calling the Looker API, otherwise supports offline resolution of these names.
note
To get complete Looker metadata integration (including Looker dashboards and charts and lineage to the underlying Looker views, you must ALSO use the looker
source module.
Install the Plugin
pip install 'acryl-datahub[lookml]'
Quickstart Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide
source:
type: "lookml"
config:
# Coordinates
base_folder: /path/to/model/files
# Options
api:
# Coordinates for your looker instance
base_url: "https://YOUR_INSTANCE.cloud.looker.com"
# Credentials for your Looker connection (https://docs.looker.com/reference/api-and-integration/api-auth)
client_id: ${LOOKER_CLIENT_ID}
client_secret: ${LOOKER_CLIENT_SECRET}
# Alternative to API section above if you want a purely file-based ingestion with no api calls to Looker or if you want to provide platform_instance ids for your connections
# project_name: PROJECT_NAME # See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what is your project name
# connection_to_platform_map:
# connection_name_1:
# platform: snowflake # bigquery, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# platform_instance: snow_warehouse # optional
# platform_env: PROD # optional
# connection_name_2:
# platform: bigquery # snowflake, hive, etc
# default_db: DEFAULT_DATABASE. # the default database configured for this connection
# default_schema: DEFAULT_SCHEMA # the default schema configured for this connection
# platform_instance: bq_warehouse # optional
# platform_env: DEV # optional
github_info:
repo: org/repo-name
# sink configs
Config Details
- Options
- Schema
Note that a .
is used to denote nested fields in the YAML recipe.
View All Configuration Options
Field | Required | Type | Description | Default |
---|---|---|---|---|
env | string | The environment that all assets produced by this connector belong to | PROD | |
platform | string | The platform that this source connects to | None | |
platform_instance | string | The instance of the platform that all assets produced by this recipe belong to | None | |
tag_measures_and_dimensions | boolean | When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column. | True | |
platform_name | string | Default platform name. Don't change. | looker | |
base_folder | ✅ | string | Local filepath where the root of the LookML repo lives. This is typically the root folder where the *.model.lkml and *.view.lkml files are stored. e.g. If you have checked out your LookML repo under /Users/jdoe/workspace/my-lookml-repo , then set base_folder to /Users/jdoe/workspace/my-lookml-repo . | None |
parse_table_names_from_sql | boolean | See note below. | False | |
sql_parser | string | See note below. | datahub.utilities.sql_parser.DefaultSQLParser | |
project_name | string | Required if you don't specify the api section. The project name within which all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on Develop followed by Manage LookML Projects in the Looker application. | None | |
max_file_snippet_length | integer | When extracting the view definition from a lookml file, the maximum number of characters to extract. | 512000 | |
explore_naming_pattern | NamingPattern (see below for fields) | Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is {model}.explore.{name} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{model}.explore.{name}', 'variables': None} | |
explore_naming_pattern.allowed_vars | ✅ | Array of string | None | |
explore_naming_pattern.pattern | ✅ | string | None | |
explore_naming_pattern.variables | Array of string | None | ||
explore_browse_pattern | NamingPattern (see below for fields) | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/explores/{model}.{name}', 'variables': None} | ||
explore_browse_pattern.allowed_vars | ✅ | Array of string | None | |
explore_browse_pattern.pattern | ✅ | string | None | |
explore_browse_pattern.variables | Array of string | None | ||
view_naming_pattern | NamingPattern (see below for fields) | Pattern for providing dataset names to views. Allowed variables are {project} , {model} , {name} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '{project}.view.{name}', 'variables': None} | |
view_naming_pattern.allowed_vars | ✅ | Array of string | None | |
view_naming_pattern.pattern | ✅ | string | None | |
view_naming_pattern.variables | Array of string | None | ||
view_browse_pattern | NamingPattern (see below for fields) | Pattern for providing browse paths to views. Allowed variables are {project} , {model} , {name} , {platform} and {env} | {'allowed_vars': ['platform', 'env', 'project', 'model', 'name'], 'pattern': '/{env}/{platform}/{project}/views/{name}', 'variables': None} | |
view_browse_pattern.allowed_vars | ✅ | Array of string | None | |
view_browse_pattern.pattern | ✅ | string | None | |
view_browse_pattern.variables | Array of string | None | ||
github_info | GitHubInfo (see below for fields) | Reference to your github location to enable easy navigation from DataHub to your LookML files | ||
github_info.repo | ✅ | string | Name of your github repo. e.g. repo for https://github.com/datahub-project/datahub is datahub-project/datahub . | None |
github_info.branch | string | Branch on which your files live by default. Typically main or master. | main | |
github_info.base_url | string | Base url for Github | https://github.com | |
connection_to_platform_map | Dict[str, LookerConnectionDefinition] | A mapping of Looker connection names to DataHub platform, database, and schema values. | ||
connection_to_platform_map.key .platform | ✅ | string | None | |
connection_to_platform_map.key .default_db | ✅ | string | None | |
connection_to_platform_map.key .default_schema | string | None | ||
connection_to_platform_map.key .platform_instance | string | None | ||
connection_to_platform_map.key .platform_env | string | The environment that the platform is located in. Leaving this empty will inherit defaults from the top level Looker configuration | None | |
model_pattern | AllowDenyPattern (see below for fields) | List of regex patterns for LookML models to include in the extraction. | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
model_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
model_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
model_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
model_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] | |
view_pattern | AllowDenyPattern (see below for fields) | List of regex patterns for LookML views to include in the extraction. | {'allow': ['.*'], 'deny': [], 'ignoreCase': True, 'alphabet': '[A-Za-z0-9 _.-]'} | |
view_pattern.allow | Array of string | List of regex patterns for process groups to include in ingestion | ['.*'] | |
view_pattern.deny | Array of string | List of regex patterns for process groups to exclude from ingestion. | [] | |
view_pattern.ignoreCase | boolean | Whether to ignore case sensitivity during pattern matching. | True | |
view_pattern.alphabet | string | Allowed alphabets pattern | [A-Za-z0-9 _.-] | |
api | LookerAPIConfig (see below for fields) | |||
api.client_id | ✅ | string | Looker API client id. | None |
api.client_secret | ✅ | string | Looker API client secret. | None |
api.base_url | ✅ | string | Url to your Looker instance: https://company.looker.com:19999 or https://looker.company.com , or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls. | None |
api.transport_options | TransportOptionsConfig (see below for fields) | Populates the TransportOptions struct for looker client | ||
api.transport_options.timeout | ✅ | integer | None | |
api.transport_options.headers | ✅ | Dict[str,string] | ||
transport_options | TransportOptionsConfig (see below for fields) | Populates the TransportOptions struct for looker client | ||
transport_options.timeout | ✅ | integer | None | |
transport_options.headers | ✅ | Dict[str,string] |
The JSONSchema for this configuration is inlined below.
{
"title": "LookMLSourceConfig",
"description": "Any source that is a primary producer of Dataset metadata should inherit this class",
"type": "object",
"properties": {
"env": {
"title": "Env",
"description": "The environment that all assets produced by this connector belong to",
"default": "PROD",
"type": "string"
},
"platform": {
"title": "Platform",
"description": "The platform that this source connects to",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"description": "The instance of the platform that all assets produced by this recipe belong to",
"type": "string"
},
"explore_naming_pattern": {
"title": "Explore Naming Pattern",
"description": "Pattern for providing dataset names to explores. Allowed variables are {project}, {model}, {name}. Default is `{model}.explore.{name}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "{model}.explore.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"explore_browse_pattern": {
"title": "Explore Browse Pattern",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "/{env}/{platform}/{project}/explores/{model}.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"view_naming_pattern": {
"title": "View Naming Pattern",
"description": "Pattern for providing dataset names to views. Allowed variables are `{project}`, `{model}`, `{name}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "{project}.view.{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"view_browse_pattern": {
"title": "View Browse Pattern",
"description": "Pattern for providing browse paths to views. Allowed variables are `{project}`, `{model}`, `{name}`, `{platform}` and `{env}`",
"default": {
"allowed_vars": [
"platform",
"env",
"project",
"model",
"name"
],
"pattern": "/{env}/{platform}/{project}/views/{name}",
"variables": null
},
"allOf": [
{
"$ref": "#/definitions/NamingPattern"
}
]
},
"tag_measures_and_dimensions": {
"title": "Tag Measures And Dimensions",
"description": "When enabled, attaches tags to measures, dimensions and dimension groups to make them more discoverable. When disabled, adds this information to the description of the column.",
"default": true,
"type": "boolean"
},
"platform_name": {
"title": "Platform Name",
"description": "Default platform name. Don't change.",
"default": "looker",
"type": "string"
},
"github_info": {
"title": "Github Info",
"description": "Reference to your github location to enable easy navigation from DataHub to your LookML files",
"allOf": [
{
"$ref": "#/definitions/GitHubInfo"
}
]
},
"base_folder": {
"title": "Base Folder",
"description": "Local filepath where the root of the LookML repo lives. This is typically the root folder where the `*.model.lkml` and `*.view.lkml` files are stored. e.g. If you have checked out your LookML repo under `/Users/jdoe/workspace/my-lookml-repo`, then set `base_folder` to `/Users/jdoe/workspace/my-lookml-repo`.",
"format": "directory-path",
"type": "string"
},
"connection_to_platform_map": {
"title": "Connection To Platform Map",
"description": "A mapping of [Looker connection names](https://docs.looker.com/reference/model-params/connection-for-model) to DataHub platform, database, and schema values.",
"type": "object",
"additionalProperties": {
"$ref": "#/definitions/LookerConnectionDefinition"
}
},
"model_pattern": {
"title": "Model Pattern",
"description": "List of regex patterns for LookML models to include in the extraction.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"view_pattern": {
"title": "View Pattern",
"description": "List of regex patterns for LookML views to include in the extraction.",
"default": {
"allow": [
".*"
],
"deny": [],
"ignoreCase": true,
"alphabet": "[A-Za-z0-9 _.-]"
},
"allOf": [
{
"$ref": "#/definitions/AllowDenyPattern"
}
]
},
"parse_table_names_from_sql": {
"title": "Parse Table Names From Sql",
"description": "See note below.",
"default": false,
"type": "boolean"
},
"sql_parser": {
"title": "Sql Parser",
"description": "See note below.",
"default": "datahub.utilities.sql_parser.DefaultSQLParser",
"type": "string"
},
"api": {
"$ref": "#/definitions/LookerAPIConfig"
},
"project_name": {
"title": "Project Name",
"description": "Required if you don't specify the `api` section. The project name within which all the model files live. See (https://docs.looker.com/data-modeling/getting-started/how-project-works) to understand what the Looker project name should be. The simplest way to see your projects is to click on `Develop` followed by `Manage LookML Projects` in the Looker application.",
"type": "string"
},
"transport_options": {
"title": "Transport Options",
"description": "Populates the [TransportOptions](https://github.com/looker-open-source/sdk-codegen/blob/94d6047a0d52912ac082eb91616c1e7c379ab262/python/looker_sdk/rtl/transport.py#L70) struct for looker client",
"allOf": [
{
"$ref": "#/definitions/TransportOptionsConfig"
}
]
},
"max_file_snippet_length": {
"title": "Max File Snippet Length",
"description": "When extracting the view definition from a lookml file, the maximum number of characters to extract.",
"default": 512000,
"type": "integer"
}
},
"required": [
"base_folder"
],
"additionalProperties": false,
"definitions": {
"NamingPattern": {
"title": "NamingPattern",
"type": "object",
"properties": {
"allowed_vars": {
"title": "Allowed Vars",
"type": "array",
"items": {
"type": "string"
}
},
"pattern": {
"title": "Pattern",
"type": "string"
},
"variables": {
"title": "Variables",
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"allowed_vars",
"pattern"
]
},
"GitHubInfo": {
"title": "GitHubInfo",
"type": "object",
"properties": {
"repo": {
"title": "Repo",
"description": "Name of your github repo. e.g. repo for https://github.com/datahub-project/datahub is `datahub-project/datahub`.",
"type": "string"
},
"branch": {
"title": "Branch",
"description": "Branch on which your files live by default. Typically main or master.",
"default": "main",
"type": "string"
},
"base_url": {
"title": "Base Url",
"description": "Base url for Github",
"default": "https://github.com",
"type": "string"
}
},
"required": [
"repo"
],
"additionalProperties": false
},
"LookerConnectionDefinition": {
"title": "LookerConnectionDefinition",
"type": "object",
"properties": {
"platform": {
"title": "Platform",
"type": "string"
},
"default_db": {
"title": "Default Db",
"type": "string"
},
"default_schema": {
"title": "Default Schema",
"type": "string"
},
"platform_instance": {
"title": "Platform Instance",
"type": "string"
},
"platform_env": {
"title": "Platform Env",
"description": "The environment that the platform is located in. Leaving this empty will inherit defaults from the top level Looker configuration",
"type": "string"
}
},
"required": [
"platform",
"default_db"
],
"additionalProperties": false
},
"AllowDenyPattern": {
"title": "AllowDenyPattern",
"description": "A class to store allow deny regexes",
"type": "object",
"properties": {
"allow": {
"title": "Allow",
"description": "List of regex patterns for process groups to include in ingestion",
"default": [
".*"
],
"type": "array",
"items": {
"type": "string"
}
},
"deny": {
"title": "Deny",
"description": "List of regex patterns for process groups to exclude from ingestion.",
"default": [],
"type": "array",
"items": {
"type": "string"
}
},
"ignoreCase": {
"title": "Ignorecase",
"description": "Whether to ignore case sensitivity during pattern matching.",
"default": true,
"type": "boolean"
},
"alphabet": {
"title": "Alphabet",
"description": "Allowed alphabets pattern",
"default": "[A-Za-z0-9 _.-]",
"type": "string"
}
},
"additionalProperties": false
},
"TransportOptionsConfig": {
"title": "TransportOptionsConfig",
"type": "object",
"properties": {
"timeout": {
"title": "Timeout",
"type": "integer"
},
"headers": {
"title": "Headers",
"type": "object",
"additionalProperties": {
"type": "string"
}
}
},
"required": [
"timeout",
"headers"
],
"additionalProperties": false
},
"LookerAPIConfig": {
"title": "LookerAPIConfig",
"type": "object",
"properties": {
"client_id": {
"title": "Client Id",
"description": "Looker API client id.",
"type": "string"
},
"client_secret": {
"title": "Client Secret",
"description": "Looker API client secret.",
"type": "string"
},
"base_url": {
"title": "Base Url",
"description": "Url to your Looker instance: `https://company.looker.com:19999` or `https://looker.company.com`, or similar. Used for making API calls to Looker and constructing clickable dashboard and chart urls.",
"type": "string"
},
"transport_options": {
"title": "Transport Options",
"description": "Populates the [TransportOptions](https://github.com/looker-open-source/sdk-codegen/blob/94d6047a0d52912ac082eb91616c1e7c379ab262/python/looker_sdk/rtl/transport.py#L70) struct for looker client",
"allOf": [
{
"$ref": "#/definitions/TransportOptionsConfig"
}
]
}
},
"required": [
"client_id",
"client_secret",
"base_url"
],
"additionalProperties": false
}
}
}
Configuration Notes
See the Looker authentication docs for the steps to create a client ID and secret.
You need to ensure that the API key is attached to a user that has Admin privileges. If that is not possible, read the configuration section to provide an offline specification of the connection_to_platform_map
and the project_name
.
note
The integration can use an SQL parser to try to parse the tables the views depends on.
This parsing is disabled by default,
but can be enabled by setting parse_table_names_from_sql: True
. The default parser is based on the sqllineage
package.
As this package doesn't officially support all the SQL dialects that Looker supports, the result might not be correct. You can, however, implement a
custom parser and take it into use by setting the sql_parser
configuration value. A custom SQL parser must inherit from datahub.utilities.sql_parser.SQLParser
and must be made available to Datahub by ,for example, installing it. The configuration then needs to be set to module_name.ClassName
of the parser.
Code Coordinates
- Class Name:
datahub.ingestion.source.lookml.LookMLSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Looker, feel free to ping us on our Slack