Skip to main content
Version: Next

Tags

Why Would You Use Tags on Datasets?

Tags are informal, loosely controlled labels that help in search & discovery. They can be added to datasets, dataset schemas, or containers, for an easy way to label or categorize entities – without having to associate them to a broader business glossary or vocabulary. For more information about tags, refer to About DataHub Tags.

Goal Of This Guide

This guide will show you how to

  • Create: create a tag.
  • Read : read tags attached to a dataset.
  • Add: add a tag to a column of a dataset or a dataset itself.
  • Remove: remove a tag from a dataset.

Prerequisites

For this tutorial, you need to deploy DataHub Quickstart and ingest sample data. For detailed information, please refer to Datahub Quickstart Guide.

note

Before modifying tags, you need to ensure the target dataset is already present in your DataHub instance. If you attempt to manipulate entities that do not exist, your operation will fail. In this guide, we will be using data from sample ingestion.

For more information on how to set up for GraphQL, please refer to How To Set Up GraphQL.

Create Tags

The following code creates a tag Deprecated.

mutation createTag {
createTag(input:
{
name: "Deprecated",
id: "deprecated",
description: "Having this tag means this column or table is deprecated."
})
}

If you see the following response, the operation was successful:

{
"data": {
"createTag": "urn:li:tag:deprecated"
},
"extensions": {}
}

Expected Outcome of Creating Tags

You can now see the new tag Deprecated has been created.

We can also verify this operation by programmatically searching Deprecated tag after running this code using the datahub cli.

datahub get --urn "urn:li:tag:deprecated" --aspect tagProperties

{
"tagProperties": {
"description": "Having this tag means this column or table is deprecated.",
"name": "Deprecated"
}
}

Read Tags

query {
dataset(urn: "urn:li:dataset:(urn:li:dataPlatform:hive,SampleHiveDataset,PROD)") {
tags {
tags {
tag {
name
urn
properties {
description
colorHex
}
}
}
}
}
}

If you see the following response, the operation was successful:

{
"data": {
"dataset": {
"tags": {
"tags": [
{
"tag": {
"name": "Legacy",
"urn": "urn:li:tag:Legacy",
"properties": {
"description": "Indicates the dataset is no longer supported",
"colorHex": null,
"name": "Legacy"
}
}
}
]
}
}
},
"extensions": {}
}

Add Tags

Add Tags to a dataset

The following code shows you how can add tags to a dataset. In the following code, we add a tag Deprecated to a dataset named fct_users_created.

mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
}
)
}

If you see the following response, the operation was successful:

{
"data": {
"addTags": true
},
"extensions": {}
}

Add Tags to a Column of a dataset

mutation addTags {
addTags(
input: {
tagUrns: ["urn:li:tag:deprecated"],
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}

Expected Outcome of Adding Tags

You can now see Deprecated tag has been added to user_name column.

We can also verify this operation programmatically by checking the globalTags aspect using the datahub cli.

datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags

Remove Tags

The following code remove a tag from a dataset. After running this code, Deprecated tag will be removed from a user_name column.

mutation removeTag {
removeTag(
input: {
tagUrn: "urn:li:tag:deprecated",
resourceUrn: "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)",
subResourceType:DATASET_FIELD,
subResource:"user_name"})
}

Expected Outcome of Removing Tags

You can now see Deprecated tag has been removed to user_name column.

We can also verify this operation programmatically by checking the gloablTags aspect using the datahub cli.

datahub get --urn "urn:li:dataset:(urn:li:dataPlatform:hive,fct_users_created,PROD)" --aspect globalTags

{
"globalTags": {
"tags": []
}
}