DataHub provides the ability to declare fine-grained access control Policies via the UI & GraphQL API. Access policies in DataHub define who can do what to which resources. A few policies in plain English include
- Dataset Owners should be allowed to edit documentation, but not Tags.
- Jenny, our Data Steward, should be allowed to edit Tags for any Dashboard, but no other metadata.
- James, a Data Analyst, should be allowed to edit the Links for a specific Data Pipeline he is a downstream consumer of.
- The Data Platform team should be allowed to manage users & groups, view platform analytics, & manage policies themselves.
In this document, we'll take a deeper look at DataHub Policies & how to use them effectively.
There are 2 types of Policy within DataHub:
- Platform Policies
- Metadata Policies
We'll briefly describe each.
Platform policies determine who has platform-level privileges on DataHub. These privileges include
- Managing Users & Groups
- Viewing the DataHub Analytics Page
- Managing Policies themselves
Platform policies can be broken down into 2 parts:
- Actors: Who the policy applies to (Users or Groups)
- Privileges: Which privileges should be assigned to the Actors (e.g. "View Analytics")
Note that platform policies do not include a specific "target resource" against which the Policies apply. Instead, they simply serve to assign specific privileges to DataHub users and groups.
Metadata policies determine who can do what to which Metadata Entities. For example,
- Who can edit Dataset Documentation & Links?
- Who can add Owners to a Chart?
- Who can add Tags to a Dashboard?
and so on.
A Metadata Policy can be broken down into 3 parts:
- Actors: The 'who'. Specific users, groups that the policy applies to.
- Privileges: The 'what'. What actions are being permitted by a policy, e.g. "Add Tags".
- Resources: The 'which'. Resources that the policy applies to, e.g. "All Datasets".
Today, the set of privileges supported includes only write privileges. That is, there are no read restrictions implemented yet.
Policies can be managed under the
/policies page, or accessed inside the Control Center, a slide-out menu
appearing on the left side of the DataHub UI. The
Policies tab will only be visible to those users having the
Out of the box, DataHub is deployed with a set of pre-baked Policies. The set of default policies are created at deploy
time and can be found inside the
policies.json file within
metadata-service/war/src/main/resources/boot. This set of policies serves the
- Assigns immutable super-user privileges for the root
datahubuser account (Immutable)
- Assigns all Platform privileges for all Users by default (Editable)
The reason for #1 is to prevent people from accidentally deleting all policies and getting locked out (
datahub super user account can be a backup)
The reason for #2 is to permit administrators to log in via OIDC or another means outside of the
datahub root account
when they are bootstrapping with DataHub. This way, those setting up DataHub can start managing policies without friction.
Note that these privilege can and likely should be altered inside the Policies page of the UI.
Pro-Tip: To login using the
datahubaccount, simply navigate to
datahub. Note that the password can be customized for your deployment by changing the
user.propsfile within the
datahub-frontendmodule. Notice that JaaS authentication must be enabled.
By default, the Policies feature is enabled. This means that the deployment will support creating, editing, removing, and most importantly enforcing fine-grained access policies.
In some cases, these capabilities are not desirable. For example, if your company's users are already used to having free reign, you may want to keep it that way. Or perhaps it is only your Data Platform team who actively uses DataHub, in which case Policies may be overkill.
For these scenarios, we've provided a back door to disable Policies in your deployment of DataHub. This will completely hide the policies management UI and by default will allow all actions on the platform. It will be as though each user has all privileges, both of the Platform & Metadata flavor.
To disable Policies, you can simply set the
AUTH_POLICIES_ENABLED environment variable for the
datahub-gms service container
false. For example in your
docker/datahub-gms/docker.env, you'd place
The DataHub team is hard at work trying to improve the Policies feature. We are planning on building out the following:
- Hide edit action buttons on Entity pages to reflect user privileges
- Ability to define Metadata Policies against multiple resources scoped to a particular "Domains"
- Ability to define Metadata Policies against multiple reosurces scoped to particular "Containers" (e.g. A "schema", "database", or "collection")
We want to hear from you! For any inquiries, including Feedback, Questions, or Concerns, reach out on Slack!