- Start Date: 08/28/2020
- RFC PR: 1842
- Implementation PR(s): TBD
Adding Support for Business Glossary enhances the value of metadata and brings the business view. This helps to document the business terms used across the business and provides the common vocabulary to entire data stakeholders/community. This encourages/motivates the business community to interact with Data Catalog to discover the relevant data assets of interest. This also enables finding the relationship in the data assets through business terms that belong to them. This following link illustrates the importance of business glossary article.
We need to model Business Glossary, where the business team can define the business terms and link them to the data elements being onboarded to Data Platforms/data catalogs. This gives the following benefits :
- Define and enable common vocabulary in the organizations and enable easy collaborations with the business & technical communities
- Organizations can leverage the existing industry taxonomies where they can import the definitions and can enhance or define there specific terms/definitions
- the crux and use of business glossary will be by linking the dataset/elements to Business Terms, so that business/consumers can discover the interested datasets easily with the helps of business terms
- Promote the usage and reduce the redundancy: Business Glossary helps to discover the datasets quickly through business terms and this also helps reducing unnecessary onboarding the same/similar datasets by different consumers.
Business Glossary, is a list of business terms with their definitions. It defines business concepts for an organization or industry and is independent from any specific database or platform or vendor.
Data Dictionary is a description of a data set, provides the details about the attributes and data types
Even though Data Dictionary and Business Glossary are separate entities, they work nicely together to describe different aspects and levels of abstraction of the data environment of an organization. Business terms can be linked to specific entities/tables and columns in a data asset/data dictionary to provide more context and consistent approved definition to different instances of the terms in different platforms/databases.
|URN||Business Term||Definition||Domain/Namespace||Owner||Ext Source||Ext Reference|
|urn:li:glossaryTerm:instrument.cashInstrument||instrument.cashInstrument||time point including a date and a time, optionally including a time zone offset||Foundationfirstname.lastname@example.org||fibo||https://spec.edmcouncil.org/fibo/ontology/FBC/FinancialInstruments/FinancialInstruments/CashInstrument|
|urn:li:glossaryTerm:common.dateTime||common.dateTime||a financial instrument whose value is determined by the market and that is readily transferable (highly liquid)||Financeemail@example.com||fibo||https://spec.edmcouncil.org/fibo/ontology/FND/DatesAndTimes/FinancialDates/DateTime|
|urn:li:glossaryTerm:market.bidSize||market.bidSize||The bid size represents the quantity of a security that investors are willing to purchase at a specified bid price||Tradingfirstname.lastname@example.org||-||-|
|Attribute Name||Data Type||Nullable?||Business Term||Description|
|arrivalTime||TimestampTicks||N||Time the price book was received by the TickCollector. 100s of Nanoseconds since 1st January 1970 (ticks)|
|bid1Price||com.xxxx.yyy.schema.common.Price||N||common.monetoryAmount||The bid price with rank 1/29.|
|bid1Size||int||N||market.bidSize||The amount the bid price with rank 5/29 is good for.|
Business Glossary will be a first class entity where one can define the
GlossaryTerms and this will be similar to entities like Dataset, CorporateUser etc. Business Term can be linked to other entities like Dataset, DatasetField. In future Business terms can be linked to Dashboards, Metrics etc
The above diagram illustrates how Business Terms will be connected to other entities entities like Dataset, DatasetField. The above example depicts business terms are
Term-n and how they are linked to
e11 is linked to Business Term
e12 is linked to
e23 linked the Business Term
Term-5. Dataset (DS-2) is linked to business term
DS-2) it-self linked to Business Term
There will be 1 top level GMA entities in the design: glossaryTerm (Business Glossary). It's important to make glossaryTerm as a top level entity because it can exist without a Dataset and can be defined independently by the business team.
We'll define a URNs:
These URNs should allow for unique identification of business term.
A business term URN (GlossaryTermUrn) will look like below:
There will be new snapshot object to onboard business terms along with definitions
Path : metadata-models/src/main/pegasus/com/linkedin/metadata/snapshot/
Path : metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/
There will be new aspect defined to capture the required attributes & ownership information
Business Term Entity Definition
Business Terms will be owened by certain business users
Business Term can be asociated with Dataset Field as well as Dataset. Defning the aspect that can be asociated with Dataset and DatasetField
Proposed to have the following changes to the SchemaField to associate (optionally) with Business Glossary (terms)
Proposed to have the following changes to the Dataset aspect to associate (optionally) with Business Glossary (terms)
This might not be a crtical requirement, but nice to have.
- Users should be able to search for Business Terms and would like to see all the Datasets that have elements that linked to that Business term.
We should create/update user guides to educate users for:
- Importance and value that Business Glossary bringing to the Data Catalog
- Search & discovery experience through business terms (how to find a relevant datasets quickly in DataHub)
This is a new feature in Datahub that brings the common vocabulry across data stake holders and also enable better discoverability to the datasets. I see there is no clear alternative to this feature, at the most users can document the
business term outside the
Data Catalog and can reference/assosciate those terms as an additional property to Dataset column.
The design is supposed to be generic enough that any user of DataHub should easily be able to onboard their Business Glossary (list of terms and definitions) to DataHub irrespective of their industry. Some organizations can subscribe/download industry standard taxonomy with slight modelling and integration should be able to bring the business glossary quickly
While onboarding datasets, business/tech teams need to link the business terms to the data elements, once users see the value of this will be motivated to link the elements with appropriate business terms.
- This RFC does not cover the UI design for Business Glossary Definition.