DataHub Releases
Summary
v0.8.39
Released on Fri Jun 24 2022 by @maggiehays.
Highlights
User Experience
- NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
- NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
- Improvement to Impact Analysis: When looking at the
Lineage
tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)
Developer Experience
- NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus
Metadata Ingestion
- NEW: Make bulk edits to your metadata via CSV (read more)
- Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
- Managed ingestion update: removed need for sink block
What's Changed
- fix(ui-ingestion): update looker ingestion warning banner by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5142
- chore: Bump Default UI Ingestion Version 0.8.38 by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5145
- feat(schema): support rendering schemas with
.
in field names by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5141 - feat(dbt): Platform instances for target platform by @skrydal in https://github.com/datahub-project/datahub/pull/5129
- feat(ingest): snowflake profile tables only if they have been updates… by @mayurinehate in https://github.com/datahub-project/datahub/pull/5132
- fix(airflow): fixes DeprecationWarning with hook-class-names by @sayakmaity in https://github.com/datahub-project/datahub/pull/5143
- feat(frontend): Parse JWT access token claims by @chen4119 in https://github.com/datahub-project/datahub/pull/5138
- fix(tokens): Using keyword search filters for ListAccessTokensResolver by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5154
- feat(ui) Update the max text length of Terms/Term Groups by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5162
- docs(policies): add info about Manage User Credentials by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5157
- fix(restore-indices): Do not fail on MAE row count diff by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5165
- fix(Kafka-setup): Make sure it doesn't fail when the new envs are not set by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5168
- chore(deps): Bump Nimbus Jose JWT dependency by @pedro93 in https://github.com/datahub-project/datahub/pull/5158
- fix(recs): Verify that an entity exists before recommending by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5163
- fix(business glossary): setting properties to be empty if the node has no properties aspect by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5166
- refactor(ui): Misc improvements to Dataset Assertions UI by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5155
- chore(guava): force version of guava in client jars per #5134 by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5153
- feat(boot): Make Glossary Term Upgrade Async by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5164
- fix(frontend): Add iam auth jar to frontend by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5171
- docs(features): update & clean up Features page by @maggiehays in https://github.com/datahub-project/datahub/pull/5175
- fix(glue): fix glue profiling config option by @kangseonghyun in https://github.com/datahub-project/datahub/pull/5178
- feat(upgrade) Check version when determining to run RestoreGlossaryIndices step by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5182
- fix(jaas): fixed auth.jaas.enabled option parsing by @alexey-kravtsov in https://github.com/datahub-project/datahub/pull/5179
- feat(ingestion): bigquery - Option to send usage queries as well as Operational metadata by @treff7es in https://github.com/datahub-project/datahub/pull/5151
- feat(build): changes to decrease build time, cancel runs in case of multiple commits by @anshbansal in https://github.com/datahub-project/datahub/pull/5187
- refactor(docs): Update Metadata Events Docs by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5173
- fix(ingest): If there is no manager for a LDAP user (example: system account) by @bda618 in https://github.com/datahub-project/datahub/pull/5180
- bug(ingest): correct case of sys views for mssql description populati… by @BALyons in https://github.com/datahub-project/datahub/pull/5186
- refactor(configs): Simplify Kafka Topic name configurations + docs by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5198
- feat(ingest): dbt - adding support for dbt tests by @shirshanka in https://github.com/datahub-project/datahub/pull/5201
- fix(cli): correct handling of env variables by @anshbansal in https://github.com/datahub-project/datahub/pull/5203
- feat(ci): split integration tests to reduce run time by @anshbansal in https://github.com/datahub-project/datahub/pull/5205
- feat(datahub-client): add java kafka emitter by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5074
- feat(graphql): add metrics capturing for graphql latency by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5200
- test(ingestion): bigquery-usage - Adding tests for bigquery usage filters by @treff7es in https://github.com/datahub-project/datahub/pull/5195
- fix(ui): load monaco-editor as a dependency and not from a third party CDN by @Masterchen09 in https://github.com/datahub-project/datahub/pull/5189
- feat(cli): Add token parameter for sample ingestion by @pedro93 in https://github.com/datahub-project/datahub/pull/5160
- feat(lineage) Update Lineage tab and Impact Analysis feature by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5121
- fix(ingest): add missing ownership types by @afghori in https://github.com/datahub-project/datahub/pull/5209
- feat(ingestion) ldap: make ldap attrs keys configurable by @atulsaurav in https://github.com/datahub-project/datahub/pull/4682
- Remove unnecessary space from application.yml of GMS by @mmmeeedddsss in https://github.com/datahub-project/datahub/pull/5216
- fix(upgrade): fix upgrade when s3 path has = by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5220
- feat(docs) Add and update docs for the new Glossary experience by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5211
- feat(glossary) Add empty state for the Business Glossary home page by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5217
- feat(bootstrap): add bootstrap step to clear out unknown aspect rows from the database by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5148
- feat(ingest): adds csv enricher ingestion source by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5221
- fix(build): pin confluent kafka dependency by @anshbansal in https://github.com/datahub-project/datahub/pull/5224
- fix(ingest): databricks - ingest structs correctly through hive by @shirshanka in https://github.com/datahub-project/datahub/pull/5223
- feat(dbt): add sibling association logic to associate dbt elements with their target systems by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5190
- feat(tableau): use pagination for all connection queries by @mayurinehate in https://github.com/datahub-project/datahub/pull/5204
- Handling 404 page not found by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5227
- refactor(UI): Refactor Dataset Health Status by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5222
- fix(dbt-test): Inconsistency in assertions by @Santhin in https://github.com/datahub-project/datahub/pull/5214
- feat(ingest): remove need for sink block in UI based ingestion by @anshbansal in https://github.com/datahub-project/datahub/pull/5208
- fix(ingest): bigquery - Grouping date named tables at bigquery by @treff7es in https://github.com/datahub-project/datahub/pull/5230
- Add check for 0 rows when profiling datasets from s3 by @Jiafi in https://github.com/datahub-project/datahub/pull/5219
- [bug fix]: disabled create buttons by @xiphl in https://github.com/datahub-project/datahub/pull/5234
- fix(ingest): bigquery - Handling gracefully sql parser error in bq lineage by @treff7es in https://github.com/datahub-project/datahub/pull/5238
- fix(ingest): do not dump password by @anshbansal in https://github.com/datahub-project/datahub/pull/5235
- feat(ingest): dbt - improving dbt_meta mapping by @shirshanka in https://github.com/datahub-project/datahub/pull/5237
- fix(siblings): Force merged urn to viewed entity urn by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5247
- refactor(docs): Move CLI docs to root level by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5246
New Contributors
- @skrydal made their first contribution in https://github.com/datahub-project/datahub/pull/5129
- @sayakmaity made their first contribution in https://github.com/datahub-project/datahub/pull/5143
- @kangseonghyun made their first contribution in https://github.com/datahub-project/datahub/pull/5178
- @alexey-kravtsov made their first contribution in https://github.com/datahub-project/datahub/pull/5179
- @bda618 made their first contribution in https://github.com/datahub-project/datahub/pull/5180
- @BALyons made their first contribution in https://github.com/datahub-project/datahub/pull/5186
- @afghori made their first contribution in https://github.com/datahub-project/datahub/pull/5209
- @Santhin made their first contribution in https://github.com/datahub-project/datahub/pull/5214
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.38...v0.8.39
[!] DataHub v0.8.38
Released on Thu Jun 09 2022 by @jjoyce0510.
Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.
The below release notes are copied from v0.8.37 release notes.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
Disclaimers:
With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.
If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react
container: AUTH_NATIVE_ENABLED=false.
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in https://github.com/datahub-project/datahub/pull/5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in https://github.com/datahub-project/datahub/pull/5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in https://github.com/datahub-project/datahub/pull/5086
- chore(dep): upgrade json-smart by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in https://github.com/datahub-project/datahub/pull/5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in https://github.com/datahub-project/datahub/pull/5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in https://github.com/datahub-project/datahub/pull/5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in https://github.com/datahub-project/datahub/pull/4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in https://github.com/datahub-project/datahub/pull/5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in https://github.com/datahub-project/datahub/pull/5057
- feat(oidc): add configurable read timeout by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5063
- fix(profiling): don't stop if some steps fail by @anshbansal in https://github.com/datahub-project/datahub/pull/5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in https://github.com/datahub-project/datahub/pull/5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in https://github.com/datahub-project/datahub/pull/5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5100
- fix(build): reduce time taken for resolution by @anshbansal in https://github.com/datahub-project/datahub/pull/5106
- fix(build): remove dependencies added for compatibility by @anshbansal in https://github.com/datahub-project/datahub/pull/5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in https://github.com/datahub-project/datahub/pull/5109
- Policies page issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in https://github.com/datahub-project/datahub/pull/5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in https://github.com/datahub-project/datahub/pull/5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in https://github.com/datahub-project/datahub/pull/5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in https://github.com/datahub-project/datahub/pull/5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in https://github.com/datahub-project/datahub/pull/5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in https://github.com/datahub-project/datahub/pull/5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in https://github.com/datahub-project/datahub/pull/4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in https://github.com/datahub-project/datahub/pull/5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in https://github.com/datahub-project/datahub/pull/5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in https://github.com/datahub-project/datahub/pull/5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5136
- fix(ingest): fix table urn for athena connectionType by @mayurinehate in https://github.com/datahub-project/datahub/pull/5135
- Fixed the UI issue on Deprecated Pop-Up issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5130
- fix(ui-ingestion): show warning banner when configuring looker ui-ingestion for the first time by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5139
- fix(tokens): Fix stale cache problem, reduce cache timeout for access tokens + fix listing owner tokens by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5140
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.37...v0.8.38
[!] DataHub v0.8.37
Released on Thu Jun 09 2022 by @jjoyce0510.
Notice! This version has a few known bugs regarding revocable access tokens. Specifically, the UI for listing access tokens does not work properly unless you have a specific platform privilege. Additionally, there is a delay in revoking access tokens of 6 hours. We recommend that you skip this version and upgrade directly to v0.8.38.
Highlights
User Experience
This release comes packed full of new features and updates.
- NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
- NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
- NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
- UPDATE - Rename “Manage” navigation item to “Govern”
- [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
- [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
- FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
- Minor fixes & improvements to UI for adding policy users + groups.
Metadata Ingestion
- Support Snowflake ingest via Oauth
- Misc fixes and improvements to existing ingestion sources
What's Changed
- feat(docs): auto-open config section for ingestion sources by @shirshanka in https://github.com/datahub-project/datahub/pull/5075
- feat(spark-lineage): coalesce spark jobs by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/5077
- refactor(ui): UI Navigation Refactoring by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5076
- Update docs to alert users to restore indices for their Glossary by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5082
- fix(restore-indices): Do not fail while working with each row by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5084
- fix(ingestion): looker - Handling gracefully invalid json in query dynamic field by @treff7es in https://github.com/datahub-project/datahub/pull/5083
- feat(docs): ingest - add tab for config json schema by @shirshanka in https://github.com/datahub-project/datahub/pull/5086
- chore(dep): upgrade json-smart by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5081
- feat(ingest): rest_emitter - Adding option to rest emitter to disable ssl verification by @treff7es in https://github.com/datahub-project/datahub/pull/5042
- feat(cli): suggest upgrades when appropriate by @shirshanka in https://github.com/datahub-project/datahub/pull/5091
- feat(doc): Generating json schema for ingestion recipes by @treff7es in https://github.com/datahub-project/datahub/pull/5092
- feat(ingest): snowflake using oauth by @saxo-lalrishav in https://github.com/datahub-project/datahub/pull/4647
- fix(ui): do not show copy URN buttons when Clipboard API is not available by @Masterchen09 in https://github.com/datahub-project/datahub/pull/5087
- feat(kafka): use a thread pool executor for kafka for thread reuse by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5079
- Manage Access Tokens by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5067
- tests(lookml): adding tests for model deny patterns by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4934
- feat(model): Add optional context field to tag/term association by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/5085
- fix(glossary) Two quick followup fixes around the new Glossary updates by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5065
- chore(deps): bump eventsource from 1.1.0 to 1.1.1 in /docs-website by @dependabot in https://github.com/datahub-project/datahub/pull/5057
- feat(oidc): add configurable read timeout by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5088
- feat(glossary) Display Incoming 'IsA' Glossary related entities by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5063
- fix(profiling): don't stop if some steps fail by @anshbansal in https://github.com/datahub-project/datahub/pull/5095
- feat(upgrades) Create new DataHubUpgrade + Restore Glossary Entities Bootstrap step by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5099
- fix(deps): ingest - moving packaging to framework_common by @shirshanka in https://github.com/datahub-project/datahub/pull/5096
- feat(frontend) Allow overriding akka-max-header-value-length by @karoliskascenas in https://github.com/datahub-project/datahub/pull/5094
- refactor(graphql): Migrate Visual Config into the Configuration Provider by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4780
- chore(akka): upgrade akka http for vuln by @RyanHolstien in https://github.com/datahub-project/datahub/pull/5100
- fix(build): reduce time taken for resolution by @anshbansal in https://github.com/datahub-project/datahub/pull/5106
- fix(build): remove dependencies added for compatibility by @anshbansal in https://github.com/datahub-project/datahub/pull/5108
- fix(ci): pin google-cloud-logging to avoid pip backtracking by @shirshanka in https://github.com/datahub-project/datahub/pull/5109
- Policies page issue by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5107
- chore(deps): Bump spring to 5.3.20 for vuln fix by @pedro93 in https://github.com/datahub-project/datahub/pull/5110
- fix(cli): Bumping avro-gen3 to 0.7.4 by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5098
- feat(docs): Updating example files with the new ingestion recipe suffix by @treff7es in https://github.com/datahub-project/datahub/pull/5103
- feat(graphql): add graphql endpoint to check whether an entity exists by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5102
- feat(looker): ensure explore name matches looker's display name by @shirshanka in https://github.com/datahub-project/datahub/pull/5111
- fix(ui): Fixing missing homescreen logo by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/5112
- fix(dbt): final fix of dbt platform instance issues by @gabe-lyons in https://github.com/datahub-project/datahub/pull/5115
- feat(ingestion): bigquery-usage - Collect stats from read event reasons by @treff7es in https://github.com/datahub-project/datahub/pull/5118
- feat(terms) Add ability to Add and Remove Related Terms to Glossary Terms by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5120
- Fixed Issue : Add Members Modal by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/5117
- fix(bigquery): handling of empty partitioned tables, improve report message by @anshbansal in https://github.com/datahub-project/datahub/pull/5122
- feat(glossary) Hide self and children from select when moving a GlossaryNode by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/5123
- fix(ingestion): bigquery-usage - Removing filtering at queryevents by @treff7es in https://github.com/datahub-project/datahub/pull/5124
- feat(users): add ability to add native users from the UI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5097
- fix(ingestion): Looker original view name should be used for explore_joins by @sebkim in https://github.com/datahub-project/datahub/pull/4928
- fix(iceberg): Change how MapType are mapped to Avro to support complex Map key type by @cccs-eric in https://github.com/datahub-project/datahub/pull/5060
- fix(ingestion): bigquery-usage - Only send operational metadata for allowed tables by @treff7es in https://github.com/datahub-project/datahub/pull/5127
- fix(dbt): Validator error fix by @BoyuanZhangDE in https://github.com/datahub-project/datahub/pull/5125
- feat(settings): skip calling graphql hooks if user does not have the right permissions by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/5136
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.36...v0.8.37
DataHub V0.8.36
Released on Thu Jun 02 2022 by @treff7es.
V0.8.36
Highlights
User Experience
NEW – Manage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!
Ability to add multiple Owners, Tags, Terms
Developer Experience
The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.
Ingestion Updates
This release includes 3 new Metadata Sources:
- Iceberg
- Vertica
- SAP HANA
📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣
These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!
We’ve rolled out the following ingestion-related improvements:
- AWS Glue - data profiling is now supported
- S3 ingestion speed-up
- Various bug fixes
Full Commit Log
- #5071 @dexter-mh-lee fix(docker): Fix mysql setup bug
- #5066 @jjoyce0510 refactor(docs): Rename metadata modeling ingestion sidebar titles
- #5036 @mmmeeedddsss fix(mysql-setup-job): add mysql default port override support
- #5056 @nj7 fix: ES Rest Client Creation for non ssl authenticated connection
- #5053 @ShubhamThakre fix(ui): ui bug fix for datasets sidebar stats section
- #5061 @anshbansal feat(redash): add parallelism support for ingestion
- #5017 @anshbansal feat(model): new chart types
- #5047 @RyanHolstien fix(datahub-upgrade): exclude unnecessary configuration from standalone applications
- #5052 @shirshanka feat(ci): datahub-client - add workflow, fix build
- #5054 @jjoyce0510 docs(actions): Adding DataHub Actions to docs website
- #5031 @piyushn-stripe feat(frontend): Allow overriding frontend with a custom akka http server
- #5050 @dexter-mh-lee Remove exception on ingest policies
- #5043 @Masterchen09 fix(docs): hana - rename SAP HANA source and data platform
- #5051 @shirshanka fix(ingest): fix build breakage due to traitlets 5.2.2 bug
- #5045 @anshbansal fix(redash): fix bug with names, add option for page size, debugging info
- #5022 @jjoyce0510 fix(restore): Add RESTATE ChangeType to MCL / MCP to permit restore indices
- #5041 @anshbansal doc(bigquery): fix missing permissions
- #5030 @endeesa fix(doc) - Specify docker-compose version to avoid compatibility issues
- #4879 @BoyuanZhangDE feat(ingest): glue - enable profiling
- #5035 @treff7es fix(profiling): bigquery - Fix for Bigquery temp table creation on GE >= 0.15.3
- #5040 @shirshanka fix(build): m1 build fails to install hdb-cli
- #5026 @chriscollins3456 feat(glossary) Business Glossary updates
- #4940 @MugdhaHardikar-GSLab fix(spark-lineage): remove need for sparksession.stop call
- #5023 @rslanka fix(ingest): common - fix nullability determination for the AVRO fixed type.
- #5012 @anshbansal fix(cli): don't use env for container, add example
- #5021 @maggiehays docs(townhall): update townhall rsvp link and add may townhall detail
- #5038 @shirshanka fix(build): docgen should fail if plugin is not loadable
- #5033 @RyanHolstien fix(timelineAPI): fix issue with semantic versioning
- #5034 @RyanHolstien fix(telemetry): exclude configuration from standalone apps
- #5029 @RyanHolstien feat: telemetry improvements
- #5028 @gabe-lyons dont set platform instances for sources
- #5027 @anshbansal fix(parsing): incorrect parsing for commas
- #4938 @Ankit-Keshari-Vituity refactor(ui): UI Integration to add multiple tags, terms and owners
- #5025 @anshbansal fix(parsing): improve sql parsing, some debugging redash
- #5024 @rslanka fix(ingestion): Remove hana from base_dev_requirements to unblock m1 users
- #5014 @anshbansal fix(bigquery): reduce number of calls for details of partitioning
- #5016 @ShubhamThakre fix(ui): arrow click position update
- #5019 @rslanka fix(build): fix for hana build failure for aarch64.
- #5020 @jjoyce0510 feat(Tests): Make DataHub Tests Feature configurable via env variable
- #5005 @hsheth2 test(ingestion): change class names to avoid unittest warnings
- #5006 @hsheth2 fix(ingestion): use raw strings for regexes
- #5010 @rslanka feat(ingestion): Add Iceberg source
- #5001 @PatrickfBraz fix(bigquery-usage): fix audit metadata query template
- #4997 @anshbansal fix(redash): improve logging for debugging, add validation for dataset urn, some refactoring
- #4376 @buggythepirate feat(ingest): Added new ingestion source SAP HANA
- #5011 @rslanka Fix pulsar source docs.
- #4555 @eburairu feat(ingest): Add Source from Vertica
- #5008 @anshbansal fix(dbt): missing aws dependency
- #5007 @anshbansal fix(bigquery): restrict protobuf version
- #5004 @pedro93 fix(gms): Fix incorrect StatefulTokenService init
- #5002 @ShubhamThakre fix(ui): ui bug fix - fixing search card vertical margin
- #4994 @anshbansal doc(delete): add example for dataflow and datajob
- #4988 @jjoyce0510 feat(DataHub Operations): Adding GraphQL mutation for reporting Dataset operations
- #4998 @shirshanka fix(cli): timeline - adjust for timeline API changes on server
- #5000 @pedro93 fix(docs): Fixes token docs
- #4989 @jjoyce0510 feat(Tests): Metadata Tests Models + APIs + UI (Part 1)
- #4995 @treff7es fix(airflow): Fix for Airflow 1 support
- #4993 @shirshanka chore(deps): upgrade gson version
- #4935 @BoyuanZhangDE feat(dbt): enable dbt read artifacts from s3
- #4833 @treff7es feat(airflow): Airflow lineage ingestion plugin
- #4931 @mayurinehate fix(ingest): tableau - fix chart custom properties None key error, update docs
- #4943 @mayurinehate feat(model): add created, lastModified auditstamps to SchemaField
- #4991 @anshbansal refactor(redash): emit charts first and try with id based dashboard API first
- #4942 @mohdsiddique metabase chart are missing from dashboard
- #4992 @anshbansal doc(ingest): update golden file command
- #4927 @treff7es feat(ingest): s3 - speeding up ingestion with sampling
- #4979 @pedro93 fix(smoke-tests) Increases sleep timeout in rollback test to prevent flakiness
- #4964 @dexter-mh-lee feat(run): Create a describe run endpoint for fetching aspects created by the ingestion run
- #4169 @claudio-benfatto feat(ingestion): optionally disable some kafka schema warnings
- #4972 @mayurinehate feat(great-expectations): allow DATAHUB_DEBUG env var to enable debug logs in GE Action
- #4957 @justinas-marozas refactor(metadata-io): introduce a storage-independent in-memory entity aspect model
- #4982 @jjoyce0510 feat(authorization): Adding AuthorizerContext + ResourceSpecResolver to context
- #4984 @anshbansal doc(ingestion): default boolean fix, broken bigquery docgen
- #4970 @pedro93 feat(graphql) Add new Revokable Token API
- #4987 @anshbansal fix(ingest): remove new schema field usage
- #4985 @anshbansal fix(redash): use dashboard id if slug does not work
- #4986 @pedro93 chore(deps): upgrade datastax libs version
- #4981 @RyanHolstien fix(metadata-service): telemetry - fix hardcoded aspect name, suppress errors when producing MAE
- #4983 @shirshanka fix(ingest): mode - dashboards without creator info fails to process
- #4975 @chriscollins3456 fix(UI) Fix multiple UI usability issues
- #4977 @maggiehays docs(townhall): update invite links and townhall history
- #4980 @MugdhaHardikar-GSLab feat(spark-lineage): support for persist API
- #4974 @anshbansal feat(bigquery): add partition key tag
- #4967 @anshbansal fix(bigquery): add rate limiting for api calls made
- #4971 @shirshanka fix(cli): graph - get_aspect_v2 method fails to deserialize aspects correctly
- #4958 @anshbansal doc(ingest): mysql - describe required grants
- #4969 @RyanHolstien doc(telemetry): fix telemetry doc
- #4878 @MugdhaHardikar-GSLab fix(datahub-client): support utf8 encoding
- #4961 @anshbansal feat(bigquery): reduce logging
- #4909 @ShubhamThakre fix(ui): policy outside modal click issue update
- #4968 @jeffmerrick docs(website): Remove banner and nav item for metadata day 2022
- #4965 @mmmeeedddsss docs(datahub-kafka-sink): add topic_routes config to doc of datahub-kafka-sink
- #4966 @liyuhui666 fix(data platforms): Update data_platforms.json
- #4922 @mayurinehate feat(cli): raise error if get entity api fails
- #4963 @Masterchen09 fix(ui): do not show copy URN buttons when Clipboard API is not available
- #4962 @RyanHolstien feat(release): update CLI version
- #4960 @RyanHolstien feat: updates for 0.8.35
- #4945 @treff7es Revert "feat(spark-lineage): add support for iceberg and cache based plans (#4882)"
- #4954 @dexter-mh-lee fix(ci): remove scheduled artifact deletion run to avoid api rate limiting
- #4932 @anshbansal fix(bigquery): add dataset_id for bigquery
- #4952 @RyanHolstien fix(metadata-service): timeline - ignore platform and schema changes
- #4953 @dexter-mh-lee fix(ci): docker - remove multiplatform builds for unsupported images
- #4950 @dexter-mh-lee fix(ci): add artifact cleaner, make docker publish sections consistent
- #4947 @RyanHolstien fix(workflow): fix mysql credentials
- #4951 @aditya-radhakrishnan fix(frontend): Update run-local-frontend to reflect the new Play changes
- #4936 @gabe-lyons feat(transformers): add transformers to provide tags & terms to schema fields based on regex patterns
- #4948 @dexter-mh-lee Fix docker unified
- #4944 @gabe-lyons make graphql OperationType enum match up w/ pdl
[!] DataHub v0.8.35
Released on Wed May 18 2022 by @dexter-mh-lee.
Notice: Deploying this release will result in an incorrectly named aspect entry existing in the database. The impact is that some upgrade jobs may fail to perform full scans of the database. This will be fixed by upgrading to > v0.8.38 OR by pulling the latest DataHub Upgrade docker image and executing the following upgrade:
./datahub-upgrade.sh -u RemoveUnknownAspects
v0.8.35
Highlights
Reduced vulnerability counts in project Various bug fixes New streamlined docker workflow
Full Commit Log
- #4937 @RyanHolstien fix(env): provide default for unset telemetry variable
- #4926 @gabe-lyons feat(dbt): enable data platform instance on dbt
- #4933 @anshbansal fix(lint): lint failure due to mypy upgrade
- #4925 @RyanHolstien feat(telemetry): add server side telemetry
- #4917 @jjoyce0510 feat(graphql): Adding resolvers for adding multiple tags, terms, and owners
- #4924 @chen4119 fix(kafka-setup): Check if keystore/truststore location env variables are set
- #4919 @jjoyce0510 feat(ui): Adding Search Bar to all List Views (groups, users, domains, policies, ingestion)
- #4923 @chen4119 fix(kafka-setup): Add ssl.keystore.type and ssl.truststore.type
- #4882 @maggie-zhu feat(spark-lineage): add support for iceberg and cache based plans
- #4918 @RyanHolstien fix(idea): change location of coercer to make intellij not complain about classes
- #4916 @chriscollins3456 fix(ui) Fix some spacing issues on the search card
- #4914 @anshbansal docs(ingest): remove incorrectly annotated lineage capability
- #4912 @mayurinehate docs(transformer): update custom transform example to add missing super init
- #4903 @jjoyce0510 refactor(actions): Migrate to use new datahub-actions container
- #4869 @jjoyce0510 refactor(API): Add "Filter" support for Assertion Run Events, Dataset Profiles, Dataset Operations
- #4860 @anshbansal fix(doc): update doc url to generated docs
- #4910 @chriscollins3456 feat(containers) Get and display all parent containers in header and search
- #4791 @pedro93 feat(gms): Add support for deleting reference pointers when deleting by urn
- #4911 @RyanHolstien docs(frontend): update build command for partial build
- #4839 @BoyuanZhangDE feat(ingestion): For all usage connectors, allow exclusion of top_n_queries from ingestion via a config param.
- #4908 @jeffmerrick fix(docs): Metadata day 2022: Fix year
- #4859 @anshbansal doc(biqquery): add caveat for materialized view
- #4906 @jeffmerrick docs(website): add banner and nav item for metadata day 2022
- #4905 @anshbansal fix(build): Fix breaking changes from GE 0.15.3
- #4884 @shirshanka fix(deps): reduce frontend dependency
- #4902 @anshbansal doc(ingestion): add note for UI ingestion & custom sources
- #4901 @anshbansal revert(bigquery-usage): dataset allow filter impl
- #4824 @gabe-lyons fix(usage): pull usage from environment source rather than args
- #4899 @SagarTiwari24 fix(docs): Update developing.md to mention directory context
- #4892 @gabe-lyons fix(ui): fix side panel resize css
- #4890 @justinas-marozas fix(mxe-consumer): exclude CassandraAutoConfiguration from consumer boot
- #4853 @sebkim fix(ingestion): ElasticSearch when no properties from elastic_mappings, gracefully continue
- #4865 @dependabot chore(deps): bump axios from 0.21.1 to 0.21.4 in /datahub-web-react
- #4898 @treff7es fix(ingestion): bigquery-usage: Fix biquery usage table deny pattern template
- #4893 @shirshanka fix(ci): remove logging statement
- #4891 @RyanHolstien chore(deps): play - upgrade for CVEs
- #4889 @shirshanka fix(ci): clean up docker workflow for multi-tags
- #4875 @shirshanka fix(ingest): lookml - add view definitions for all views
- #4887 @shirshanka fix(ci): docker - either load or push, don't do both
- #4885 @shirshanka fix(ci): remove buildx and qemu for non multi-platform images
- #4862 @anshbansal fix(sql-parsing): improve error handling
- #4883 @shirshanka fix(ci): remove multiplatform builds from containers that don't support it
- #4881 @shirshanka feat(ci): docker actions simplify, add vulnerability scanner, simplify smoke-tests
- #4867 @chriscollins3456 feat(dataPlatformInstance) - Resolve and display dataPlatformInstance on entities
- #4880 @shirshanka fix(docs): ingest - sort modules, fix small typos
- #4866 @ShubhamThakre fix(ui): search filter entity ui update
- #4855 @treff7es fix(ingestion): dependencies - Downgrading typing-extension dependency to work with Airflow 2.0.2
- #4600 @pedro93 Use ingest proposal to submit status updates
- #4868 @RyanHolstien Revert "chore(deps): upgrade play to remove CVEs (#4864)"
- #4857 @RyanHolstien chore(jetty): upgrade jetty to 9.4.46 for CVE
- #4776 @tha23rd fix(bigquery-usage): dataset allow filter impl
- #4864 @RyanHolstien chore(deps): upgrade play to remove CVEs
- #4843 @cristiancalugaru ssl configuration support for elasticsearch source
- #4861 @RyanHolstien Revert "chore(deps): upgrade play dependencies to remove CVE vulnerabilities (#4820)"
- #4846 @dependabot chore(deps): bump async from 2.6.3 to 2.6.4 in /docs-website
- #4847 @dependabot chore(deps): bump minimist from 1.2.5 to 1.2.6 in /docs-website
- #4820 @RyanHolstien chore(deps): upgrade play dependencies to remove CVE vulnerabilities
- #4842 @rslanka fix(ingestion): Allow profiling of only those tables that are allowed by the table_pattern.
- #4844 @RyanHolstien Revert "fix(jetty): upgrade jetty dependency for CVE (#4838)"
- #4838 @RyanHolstien fix(jetty): upgrade jetty dependency for CVE
- #4840 @rslanka chore(deps): upgrade dependency io.netty:netty-all to address vulnerability
- #4841 @RyanHolstien fix(policies): change order of operations for policies bootstrap step to update index after database
- #4837 @RyanHolstien chore(deps): move from velocity 1.7 to 2.3
- #4821 @ShubhamThakre feat(ui): entity profile add copy url option update
- #4817 @aditya-radhakrishnan docs(schema-history): add usage guide for schema history
- #4835 @gabe-lyons hide soft deleted entities in lineage
- #4836 @shirshanka refactor(metadata-service): remove redundant file
- #4826 @jjoyce0510 chore(deps): pinning jackson dataformat cbor
- #4777 @treff7es feat(ingest): s3 - add support for multiple pathspecs in one recipe
- #4807 @eclaassen-pb chore(deps): upgrade spring and parquet dependencies
- #4813 @pedro93 fix(docs): Adds access policy documentation
- #4832 @mayurinehate feat(ingest): great-expectations - add more logs
v0.8.34
Released on Wed May 04 2022 by @maggiehays.
Release Highlights
Developer Experience
- DataHub Actions Framework is LIVE! The Actions Framework makes responding to real-time changes in your Metadata Graph easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture. Check out the repo here
- This release also introduces OpenAPI endpoints to post, get, and delete entities. Check out the usage guide here
- Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and contents
User Experience
- New! The Dataset Schema page now supports a “Blame View” to quickly understand how a field has evolved over semantic schema versions. You can find more info about how we compute versions here.
Ingestion Improvements
- New! Now incubating the Apache Pulsar source
- Update to Feast connector to support v0.18
- Ongoing improvements to Snowflake external table support
- Improvements to handling BigQuery audit log SQL queries
- Miscellaneous Tableau fixes for lineage, browse path, non-embedded datasets
What's Changed
- fix(cypress) - enable retries for failed tests to minimize flaking by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4680
- Deprecate an entity by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/4633
- fix(timeline): enhance schema field name change and removal support by @RyanHolstien in https://github.com/datahub-project/datahub/pull/4603
- fix(cli): rest emitter should override config and env variables by @anshbansal in https://github.com/datahub-project/datahub/pull/4622
- fix(docs): elasticsearch secret reference by @felixb in https://github.com/datahub-project/datahub/pull/4314
- fix(mcl-processor): Remove unnecessary log.info by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4686
- fix(datahub-client): avoid parallel execution of metadat-io:test by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/4685
- docs(metadata-models-custom): add example script to show producing cu… by @shirshanka in https://github.com/datahub-project/datahub/pull/4681
- fix(gms): Ensure Ordering by version when fetching next version by @arunvasudevan in https://github.com/datahub-project/datahub/pull/4696
- fix(docker): Fix issue #4683 by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4697
- feat(vulnerability): Upgrade spring libraries to latest version by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4698
- refactor(gms): EbeanAspectDao - make the orderBy clause explicitly ascending in getNextVersions by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4699
- feat(gms): Entity change events v1 (Platform Event) by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4687
- Redesign the login page by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/4684
- fix(snowflake): remove extra lineage edges in reports, change badly named config variable by @anshbansal in https://github.com/datahub-project/datahub/pull/4595
- fix(bigquery): error due to not handling data properly by @anshbansal in https://github.com/datahub-project/datahub/pull/4702
- fix(looker): Fix for Pydantic validation error for Looker TransportOptions on python 3.8 by @treff7es in https://github.com/datahub-project/datahub/pull/4705
- fix(ingest) bigquery: Moving bigquery temporary credential deletion to atexit by @treff7es in https://github.com/datahub-project/datahub/pull/4701
- fix(lineage): Fix lineage entity drawer height UI bug by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4707
- feat(ingest) - update identity sources to add flags for masking sensitive work units by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4711
- fix(snowflake): deprecate config, update examples by @anshbansal in https://github.com/datahub-project/datahub/pull/4644
- fix(glue): delete CatalogId parameter from get_jobs api call by @BoyuanZhangDE in https://github.com/datahub-project/datahub/pull/4646
- fix(ui): Show deprecate button only for specific entity pages. by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4712
- feat(ml): show custom properties for MLFeatureTable in UI by @maaaikoool in https://github.com/datahub-project/datahub/pull/4706
- fix(glue): fix error for custom connector if ignore_unsupported_conne… by @mayurinehate in https://github.com/datahub-project/datahub/pull/4667
- feat(ingest): add decimal128 custom type for mysql by @kevinhu in https://github.com/datahub-project/datahub/pull/4624
- fix(policy): Use search to fetch all policies by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4713
- fix(transformers): add snapshot aspects from dataset into base_transf… by @shirshanka in https://github.com/datahub-project/datahub/pull/4719
- Revert "fix(policy): Use search to fetch all policies" by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4725
- minor fix(metadata-ingestion): Add new schemas to python codegen by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4726
- fix(ui): Display warning in UI when metadata service auth is disabled. by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4728
- fix(timelineCli): fix naming for timeline cli by @RyanHolstien in https://github.com/datahub-project/datahub/pull/4729
- fix(entity header): Fixes two issues in the EntityHeader - update UI and remove link by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4720
- Revert "fix(timelineCli): fix naming for timeline cli (#4729)" by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4731
- feat(cli): suppress stacktrace printing on configuration errors by @shirshanka in https://github.com/datahub-project/datahub/pull/4718
- fix(cli): align default sink env variables across ingest and other cl… by @shirshanka in https://github.com/datahub-project/datahub/pull/4739
- feat(ingest) dbt: Dbt query tag mapping and match template by @treff7es in https://github.com/datahub-project/datahub/pull/4744
- fix(cli): telemetry - make config file processing more robust by @shirshanka in https://github.com/datahub-project/datahub/pull/4738
- feat(react theming): stop homepage flicker for env-var based logos by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4730
- feat(Cassandra): add Cassandra implementation of EntityService by @xdl in https://github.com/datahub-project/datahub/pull/3286
- fix(policies): Re-revert the policies fix + ingest documents directly to search by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4733
- feat(cli): Eagerly load datahub actions CLI commands by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4748
- fix(ingest) bigquery: Fix BigQuery Datetime/Timestamp type column partition table profile bug by @sebkim in https://github.com/datahub-project/datahub/pull/4658
- docs: add missing PR numbers by @anshbansal in https://github.com/datahub-project/datahub/pull/4742
- fix(azure_ad): silently discard other Azure AD object types (#4693) by @cccs-eric in https://github.com/datahub-project/datahub/pull/4704
- fix(datahub-frontend): OIDC discovery URL will not have NONE as auth_methods_supported by @chen4119 in https://github.com/datahub-project/datahub/pull/4710
- fix(docs): fix links by @daha in https://github.com/datahub-project/datahub/pull/4703
- feat(ingest): add Feast repository source by @danilopeixoto in https://github.com/datahub-project/datahub/pull/4094
- feat(soft deletes): rephrasing soft delete banner by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4753
- feat(ebeans): Add metrics to track connection pool by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4755
- fix(AWS) When using aws_profile, grab temporary credentials from the session. by @Jiafi in https://github.com/datahub-project/datahub/pull/4751
- feat(metadata-ingestion): Custom endpoint url and proxies in S3. by @pawel3275 in https://github.com/datahub-project/datahub/pull/4708
- fix(tableau): miscellaneous tableau fixes for lineage, browse path, non-embedded datasets by @mayurinehate in https://github.com/datahub-project/datahub/pull/4724
- doc: add warning for JDK by @anshbansal in https://github.com/datahub-project/datahub/pull/4761
- fix(ui): fix expandedName for dataset by @mayurinehate in https://github.com/datahub-project/datahub/pull/4762
- fix(ui): Users and Groups UI bug fixes by @ShubhamThakre in https://github.com/datahub-project/datahub/pull/4746
- fix(azure_ad): make redirect and graph_url optional parameters and update docs by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4754
- docs(glue): clarify that table regex patterns should be fully-qualified by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4747
- fix(ml models): fix features tab by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4769
- fix(lint): lib upgrade caused by @anshbansal in https://github.com/datahub-project/datahub/pull/4773
- fix(lineage) Filter dataset -> dataset lineage edges if data is transformed by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4732
- fix(build): Fix breaking changes from GE 0.15.3 that are affecting our Python3.6 smoke_tests by @rslanka in https://github.com/datahub-project/datahub/pull/4779
- fix(ingestion): Fixing how we eagerly import DataHub actions by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4784
- fix(ingest): fwk - datahub_api should be initialized by datahub-rest … by @shirshanka in https://github.com/datahub-project/datahub/pull/4786
- feat(ingestion) Ingest Tags from s3 bucket on an AWS Glue job and S3 Data Lake Ingest Job by @Jiafi in https://github.com/datahub-project/datahub/pull/4689
- fix(snowflake): improve debug log for external tables by @anshbansal in https://github.com/datahub-project/datahub/pull/4772
- feat(snowflake): add option to disable checking role grants by @anshbansal in https://github.com/datahub-project/datahub/pull/4760
- fix(m1): tweak m1 preflight by @anshbansal in https://github.com/datahub-project/datahub/pull/4771
- feat(ingestion): add Pulsar source by @vanmeete in https://github.com/datahub-project/datahub/pull/4721
- fix(mae consumer): Fixes delete logic in MAE consumer by @pedro93 in https://github.com/datahub-project/datahub/pull/4790
- feat(analytics): display glossary term percentage coverage by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4782
- refactor(gms): Removing unused source field by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4781
- feat(versionedDataset): adds a versionStamp to timeline response & adds versionStamp param to dataset graphql by @RyanHolstien in https://github.com/datahub-project/datahub/pull/4727
- fix(s3): improved handling for corner cases by @mayurinehate in https://github.com/datahub-project/datahub/pull/4774
- fix(ingest): databricks - hive ingestion should not fail on table com… by @shirshanka in https://github.com/datahub-project/datahub/pull/4787
- fix(ui ingest): Unschedule all sources on ingestion source refresh, fix delete not being enforced by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4792
- feat(tracking) Configure whether mixpanel is enabled with env variable by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4768
- feat(ingest): docs - overhaul source connector docs to make it code driven by @shirshanka in https://github.com/datahub-project/datahub/pull/4798
- fix(docs): Fixing outdated control-center doc on policies.md by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4799
- fix(ui): update default preview component with new ui design by @ShubhamThakre in https://github.com/datahub-project/datahub/pull/4783
- feat(operation): display the reported time for last updated in the UI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4800
- feat(blame) - add schema history blame UI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4793
- fix(ingestion): Fix schema field type for avro logical types by @rslanka in https://github.com/datahub-project/datahub/pull/4801
- Create sample_pii_glossary.yml by @mitchelllovessoftware123 in https://github.com/datahub-project/datahub/pull/4795
- fix(ingestion): Fix presto_on_hive tests. by @rslanka in https://github.com/datahub-project/datahub/pull/4802
- fix(bigquery): improve handling of extracted audit log sql queries by @vgaidass in https://github.com/datahub-project/datahub/pull/4735
- fix(snowflake): get external tables when there is default namespace by @anshbansal in https://github.com/datahub-project/datahub/pull/4803
- fix(snowflake): passing connect args should not cause failures by @anshbansal in https://github.com/datahub-project/datahub/pull/4764
- fix(scrolling) Fixes scrolling and weird heights for embeddedListSearch across entities by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4805
- fix(ui): update default preview card description text by @ShubhamThakre in https://github.com/datahub-project/datahub/pull/4796
- fix(ui): preview card UI design update by @ShubhamThakre in https://github.com/datahub-project/datahub/pull/4808
- fix(blame): make view blame prior to button work properly by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4810
- fix(docgen): fix failure count incrementing during doc generation by @shirshanka in https://github.com/datahub-project/datahub/pull/4806
- fix(search) Fixes a UI issue so results and filters are always separated by @chriscollins3456 in https://github.com/datahub-project/datahub/pull/4811
- feat(openapi): initial post,get, and delete endpoints for entities by @RyanHolstien in https://github.com/datahub-project/datahub/pull/4775
- feat(protobuf) Adding deprecation support for datasets and fields by @leifker in https://github.com/datahub-project/datahub/pull/4634
New Contributors
- @felixb made their first contribution in https://github.com/datahub-project/datahub/pull/4314
- @chriscollins3456 made their first contribution in https://github.com/datahub-project/datahub/pull/4707
- @sebkim made their first contribution in https://github.com/datahub-project/datahub/pull/4658
- @chen4119 made their first contribution in https://github.com/datahub-project/datahub/pull/4710
- @Jiafi made their first contribution in https://github.com/datahub-project/datahub/pull/4751
- @pawel3275 made their first contribution in https://github.com/datahub-project/datahub/pull/4708
- @vanmeete made their first contribution in https://github.com/datahub-project/datahub/pull/4721
- @mitchelllovessoftware123 made their first contribution in https://github.com/datahub-project/datahub/pull/4795
- @vgaidass made their first contribution in https://github.com/datahub-project/datahub/pull/4735
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.33...v0.8.34
DataHub v0.8.33
Released on Fri Apr 15 2022 by @dexter-mh-lee.
Release Highlights
User Experience
Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality
Ingestion Improvements
- Airflow Improvements - as demoed in March Town Hall
- Add support to capture Airflow execution runs from lineage backend
- Introduce new High level API for generating dataflow/job/dataprocessinstance
- MS SQL ingestion now captures table & column descriptions
- Trino platform support for Great Expectations
- New Presto-on-Hive ingestion source
- BigQuery ingestion now supports extraction of usage info from audit logs
- Fix to Looker ingestion to extract Explore Views from join names
- Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
- Simplify & annotate Redshift Usage source
Full Commit Log
- feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4536
- feat(ingest): add option for external Spark cluster by @kevinhu in https://github.com/datahub-project/datahub/pull/4571
- fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4573
- feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4574
- build(ui): Fix Windows UI lint by @mattmatravers in https://github.com/datahub-project/datahub/pull/4556
- doc: make note prominent on quickstart by @anshbansal in https://github.com/datahub-project/datahub/pull/4558
- fix(protobuf) minor bugfixes for protobuf by @leifker in https://github.com/datahub-project/datahub/pull/4553
- feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in https://github.com/datahub-project/datahub/pull/4552
- chore: cleanup extra file by @anshbansal in https://github.com/datahub-project/datahub/pull/4541
- feat(snowflake): reduce permissions provisioned by default by @anshbansal in https://github.com/datahub-project/datahub/pull/4543
- fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in https://github.com/datahub-project/datahub/pull/4572
- fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4582
- feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4407
- fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in https://github.com/datahub-project/datahub/pull/4577
- chore: update default cli for managed ingestion by @anshbansal in https://github.com/datahub-project/datahub/pull/4581
- feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4586
- doc(snowflake): add example of table pattern by @anshbansal in https://github.com/datahub-project/datahub/pull/4580
- fix(doc): try to fix broken link by @daha in https://github.com/datahub-project/datahub/pull/4593
- fix(bigquery): incorrect lineage when views are present by @anshbansal in https://github.com/datahub-project/datahub/pull/4568
- feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4584
- fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4588
- fix(browse): Reduce browse aggregation size by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4601
- doc: add page for handling deprecations, breaking changes etc. by @anshbansal in https://github.com/datahub-project/datahub/pull/4590
- docs(GraphQL): fix typo by @Falci in https://github.com/datahub-project/datahub/pull/4605
- feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4596
- feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in https://github.com/datahub-project/datahub/pull/4585
- feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4609
- fix: urlencode slash in urns too by @daha in https://github.com/datahub-project/datahub/pull/4527
- fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in https://github.com/datahub-project/datahub/pull/4607
- fix(protobuf) fix test instabilitity by @leifker in https://github.com/datahub-project/datahub/pull/4612
- fix(ui): Fix dashboard tags display by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4611
- feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4614
- feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in https://github.com/datahub-project/datahub/pull/4508
- fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in https://github.com/datahub-project/datahub/pull/4559
- chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in https://github.com/datahub-project/datahub/pull/4621
- feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in https://github.com/datahub-project/datahub/pull/4387
- doc(scheduling): make it easier to find ui ingestion by @anshbansal in https://github.com/datahub-project/datahub/pull/4610
- feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in https://github.com/datahub-project/datahub/pull/4608
- doc(cli): add env variables and options for ingest command by @anshbansal in https://github.com/datahub-project/datahub/pull/4598
- fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in https://github.com/datahub-project/datahub/pull/4639
- fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4640
- Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4637
- feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4629
- feat: add presto-on-hive metadata ingestion source by @jchen0824 in https://github.com/datahub-project/datahub/pull/4625
- feat(ingest): add trino platform for great expectations by @ms32035 in https://github.com/datahub-project/datahub/pull/4594
- fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in https://github.com/datahub-project/datahub/pull/4604
- [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in https://github.com/datahub-project/datahub/pull/4459
- feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in https://github.com/datahub-project/datahub/pull/4618
- fix(ingestion): ensure source/sink reports are always logged by @anshbansal in https://github.com/datahub-project/datahub/pull/4592
- fix(ingestion): extract explore views from join name in Looker by @dyanarose in https://github.com/datahub-project/datahub/pull/4627
- feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in https://github.com/datahub-project/datahub/pull/4649
- feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in https://github.com/datahub-project/datahub/pull/4441
- fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in https://github.com/datahub-project/datahub/pull/4638
- fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4652
- fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in https://github.com/datahub-project/datahub/pull/4645
- feat(ingest): capture MSSQL table+column descriptions by @kevinhu in https://github.com/datahub-project/datahub/pull/4579
- feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4651
- (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in https://github.com/datahub-project/datahub/pull/4615
- fix(ingestion): add missing workunit ids by @anshbansal in https://github.com/datahub-project/datahub/pull/4657
- fix(ingestion): Adding missing init.py by @anshbansal in https://github.com/datahub-project/datahub/pull/4659
- fix(bigquery-usage): missing dependency by @anshbansal in https://github.com/datahub-project/datahub/pull/4661
- feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4654
- feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4663
- feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in https://github.com/datahub-project/datahub/pull/4587
- fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in https://github.com/datahub-project/datahub/pull/4662
- fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in https://github.com/datahub-project/datahub/pull/4613
- Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/4664
- fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in https://github.com/datahub-project/datahub/pull/4668
- docs: remove ingestion source summary table by @maggiehays in https://github.com/datahub-project/datahub/pull/4670
- feat(ml): some machine learning followups by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4669
- fix(search): Fix urn component settings by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4672
- fix(ingestion): update example recipes by @anshbansal in https://github.com/datahub-project/datahub/pull/4660
- feat(theming): set custom logo without rebuilding by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4674
- feat(data-platform): Add platform entities for the connectors we support by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4676
- refactor(authorization): Add authorizedActor function to Authorizer interface by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4678
- docs(tags) - add tags usage guide by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4677
- fix(cli):Supress printing variables to logs during ingestion failure by @atulsaurav in https://github.com/datahub-project/datahub/pull/4566
- fix(docs): Improving Add Users Doc by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4679
- Fix/modal validations by @ShubhamThakre in https://github.com/datahub-project/datahub/pull/4673
New Contributors
- @Falci made their first contribution in https://github.com/datahub-project/datahub/pull/4605
- @ms32035 made their first contribution in https://github.com/datahub-project/datahub/pull/4508
- @jchen0824 made their first contribution in https://github.com/datahub-project/datahub/pull/4625
- @dyanarose made their first contribution in https://github.com/datahub-project/datahub/pull/4627
- @mkamalas made their first contribution in https://github.com/datahub-project/datahub/pull/4638
- @atulsaurav made their first contribution in https://github.com/datahub-project/datahub/pull/4566
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.32...v0.8.33
DataHub v0.8.32
Released on Mon Apr 04 2022 by @dexter-mh-lee.
Release Highlights
User Experience
We're excited to announce View-based RBAC Policies! You can now create and apply view-only permissions to your DataHub end-users, providing more robust access controls.
We've also included some small (but impactful!) improvements to UX, including:
- Display recent search terms when beginning the search flow
- Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as "topics" instead of "datasets"
Ingestion Highlights
- New! Protobuf ingestion (shoutout to @leifker for this Community-led contribution!)
- Initial work to support a "Notebook" entity (shoutout to @tc350981 for spearheading this work!!)
- Stateful ingestion for dbt is now supported
- Ongoing improvements to our Tableau ingestion source from @nandacamargo & @cuong-pham
- Improvements to handling database aliases for Redshift ingestion
- Improvements to S3 source:
- Add containers for datasets
- Support platform_instance
- Support for folder level datasets
- Increased flexibility to specify dataset paths
- Ingestion Fixes:
- Snowflake Usage - log warning instead of error out & other error handling
- Snowflake allow/deny patterns
- Examples of allow/deny patterns added to docs
Full Commit Log
- #4570 @gabe-lyons fix(search): handle commas in search queries in the UI
- #4557 @daha fix: replace direct and indirect references to linkedin with datahub-project
- #4569 @dexter-mh-lee fix(policy): Add view entity page priv to all entity types
- #4567@anshbansal fix(bigquery): missing dependency
- #4548 @mayurinehate fix(tableau): gracefully stop ingestion if tableau sign in not successful
- #4564 @dexter-mh-lee fix(docs): fix logo links on ingestion docs
- #4396 @Abhiram98 feat(ingestion): schema, table filtering for redshift-usage
- #4554 @darapuk (fix): Update path generated when creating LookML URL
- #4549 @maggiehays docs: add sumup logo
- #4560 @mhw docs: Fix PostgreSQL typo in
features.md
- #4562 @gabe-lyons feat(lineage): show fully qualified dataset name on expansion
- #4561 @anshbansal fix: dependencies for usage sources
- #3782 @CorentinDuhamel feat(ingest): indent sql queries for usage sources
- #4551 @pedro93 fix(rollback) Removes status & key aspects from affected aspects count during rollback
- #4538 @dexter-mh-lee fix(policy): Remove all from the resource type choices
- #4544 @anshbansal fix(ingest): snowflake-usage - log warning instead of error out
- #4542 @mattmatravers build(ui): allow custom nodeDistBaseUrl
- #4545 @mayurinehate fix(kafka-connect): add platform for default case in jdbc connector, update tests for platform instance map
- #4547 @anshbansal chore: update pull request template
- #4537 @RyanHolstien fix(dataPlatformInstance): add data platform instance to entity registry
- #4375 @mayurinehate fix(kafka-connect): fix lineage for postgres-like 3-level hierarchy d…
- #4492 @RyanHolstien fix(cli): delete - handle case insensitive entity types
- #4130 @sgomezvillamor feat(ingest): glue - adds platform instance capability
- #4456 @mohdsiddique feat(stateful dbt): add stateful ingestion capability in dbt source
- #4482 @pedro93 feat(platform): adds side-effect report for rollbacks
- #4275 @maggiehays docs: Ingestion Source Docs Template
- #4470 @mayurinehate feat(tableau): emit lineage edge from embedded datasource to upstream…
- #4535 @pppsunil feat(ingestion): Support pluggable Schema Registry for Kafka Source
- #4369 @eburairu feat(ui): Add new loading pattern logo
- #4493 @leifker feat(integration): protobuf - additional annotations and features
- #4435 @zhoxie-cisco perf(docker): datahub-gms - add jetty configuration xml
- #4532 @anshbansal doc: add example of profiling in default example
- #4533 @anshbansal doc: clarify CLI releases
- #4528 @daha fix(doc): Change to forward slash-separated strings, as in the example
- #4315 @daha fix(docs): Minor fixes to pip install commands
- #4521 @anshbansal doc: update docker docs for mentioning Python CLI
- #4523 @kevinhu fix(ingest): mssql - support database_alias
- #4501 @arunvasudevan feat(ingest): kafka-connect - support mapping for multiple DB instances
- #4477 @jjoyce0510 feat(metadata service): Introducing Platform Events
- #4526 @jjoyce0510 Adding has container
- #4494 @cuong-pham fix(ingest): make tableau ingestion more resilient to error
- #4519 @anshbansal doc(ingestion): add examples of running in docker and Kubernetes
- #4485 @andres-lowrie docs(metadata-ingestion): callout props in para
- #4511 @RyanHolstien Oss/urn validation
- #4525 @dexter-mh-lee feat(policy): Add tooltip and view button
- #4507 @mayurinehate feat(assertion): update python example, assertion entity doc
- #4513 @kevinhu feat(ingestion): detect and disable telemetry in CI
- #4516 @dexter-mh-lee feat(policy): Add domain based and view based policies
- #4520 @rslanka Fix: Snowflake Table to View lineage
- #4510 @anshbansal feat(ingest): Add config to improve user exp for initial ingestion and fix docs
- #4517 @anshbansal feat(ingest): option for number of workunits in preview
- #4503 @treff7es feat(ingest): athena - set Athena location as upstream
- #4490 @MugdhaHardikar-GSLab feat(s3): add s3 source
- #4468 @tc350981 feat(notebook): graphqul related logic change for notebook
- #4401 @anshbansal doc: update instructions for updating DataHub on quickstart
- #4500 @ShubhamThakre SecretBuilderModal -> name field validation updated
- #4505 @anshbansal docs: add example of database and schema allow/deny patterns
- #4504 @anshbansal fix(snowflake): allow/deny patterns
- #4496 @shirshanka feat(ingest): dbt,looker,sql_common,kafka - moving sources to produce display names and subtypes more consistently
- #4480 @anshbansal feat(snowflake): stop querying for usage data when no mix/max dates
- #4483 @anshbansal fix(snowflake-usage): do not ingest for stage as a dataaset
- #4497 @shirshanka moving to dockerhub for actions container
- #4475 @darapuk fix: Update GroupProfile to read from properties over deprecated info aspect
- #4489 @anshbansal fix(ingestion): pin Jinja2 to version < 3.1.0
- #4484 @anshbansal fix(ingestion): stop CLI build failures
- #4467 @anshbansal doc: add caveats to snowflake doc
- #4481 @anshbansal fix(snowflake): don't recommend accountadmin role for snowflake
- #4479 @anshbansal fix: change log level to debug
- #4476 @dexter-mh-lee Add recent searches filtering
- #4469 @tc350981 feat(ingest): add python utility classes for NotebookUrn, CorpuserUrn and CorpGroupUrn
- #4439 @ShubhamThakre Feature/modal-validation-and-UI-fixes-updates
- #4398 @kevinhu feat(ingest): simplify event IDs for function invocations
- #4474 @sgomezvillamor chore: acryl-data 0.6.12
- #4473 @treff7es fix(redshift) Properly handling database alias in redshift usage and redshift lineage generation
- #4471 @gabe-lyons enabling ml tabs
- #4460 @eclaassen-pb fix: java dependency vulnerabilities
- #4453 @treff7es feat(ingest) data-lake: Add s3 properties metadata when ingesting s3 files
- #4464 @anshbansal fix: change for repository change
- #4466 @anshbansal fix(snowflake-usage): add more error handling
- #4445 @nandacamargo fix(ingest): add fix to tableau connector when table has None fields
- #4457 @mayurinehate docs(hive): update recipe with example to specify kerberos auth
- #4313 @pedro-iatzky fix(ingest): bigquery - fix ingestion of external tables
- #4462 @gabe-lyons adding final transport options
- #4223 @tc350981 feat(notebook): add data models for Notebook entity
- #4450 @pedro93 feat(frontend) Adds multiple group claim support
- #4442 @anshbansal feat(ingestion): snowflake, bigquery - enhancements to log and bugfix
- #4333 @anshbansal doc: add guide for ui tabs
- #4443 @jjoyce0510 Fixing privilege option display bug
- #4451 @gabe-lyons fix tableau connector when it cannot connect to URI
- #4237 @tc350981 (docs) add RFC file to introduce Notebook entity data model
- #4446 @kevinneville fix: Replace old repository link with new link
- #4447 @cuong-pham getting database directly from upstream tables incase there are multiple databases in upstreamDatabases
- #4436 @treff7es Passing entity properly on deletion
- #4433 @rslanka Fix bug in the SchemaField type computation for AVRO logical types.
DataHub v0.8.31
Released on Thu Mar 17 2022 by @dexter-mh-lee.
Bugfix release to prevent failing reindexing of system metadata index in elasticsearch
Full Commit Log
- #4440 @pedro93 fix(cli) Makes filtered search deletes include BOTH removed and non-removed
- #4444 @pedro93 fix(cli) Adds elasticsearch mapping
- #4432 @leifker feat(protobuf): Gradle protobuf example project
Datahub v0.8.30
Released on Thu Mar 17 2022 by @rslanka.
V0.8.30
Release Highlights
- Fix for OIDC encryption bug from v0.8.29
- Adds platform instance id to the container id generation, and support for migrating the old container ids to the new ones via the
datahub migrate
CLI.
Notable UI-Based Features
- Showing recent searches in autocomplete.
What's Changed
- fix(ui): some small ui fixes for lineage by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4381
- fix(docs): change cabify link by @maaaikoool in https://github.com/datahub-project/datahub/pull/4373
- Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/4359
- feat(GE): add option to disable sql parsing, use default parser by @mayurinehate in https://github.com/datahub-project/datahub/pull/4377
- fix(removed): Make sure removed entities do not appear on recommendations by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4353
- fix(browse): fix browse double click issue by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4382
- fix(oidc): Update group membership each login (and make group extraction disabled by default) by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4380
- feat(ingestion): add java protobuf schema ingestion by @leifker in https://github.com/datahub-project/datahub/pull/4178
- Docs/update docs by @RyanHolstien in https://github.com/datahub-project/datahub/pull/4393
- Revert "Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker" by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4390
- feat(ingestion): improve logging, docs for bigquery, snowflake, redshift by @anshbansal in https://github.com/datahub-project/datahub/pull/4344
- fix(ingest) Azure AD: support nested groups (#4367) by @cccs-eric in https://github.com/datahub-project/datahub/pull/4368
- fix: add missing logo by @anshbansal in https://github.com/datahub-project/datahub/pull/4386
- feat(spark-lineage): add support to custom env and platform_instance by @MugdhaHardikar-GSLab in https://github.com/datahub-project/datahub/pull/4208
- fix(containers) - configure domain resolver for containers by @aditya-radhakrishnan in https://github.com/datahub-project/datahub/pull/4404
- feat(*): Support setting owner type when assigning ownership by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4354
- fix: telemetry failure should not cause CLI failure by @anshbansal in https://github.com/datahub-project/datahub/pull/4406
- feat(autocomplete): Show recent searches + improved autocomplete by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4400
- fix(ingestion): Fix mypy error stateful committable & restore mypy version. by @rslanka in https://github.com/datahub-project/datahub/pull/4408
- build(markupsafe): update markupsafe pinning for Airflow compatibility by @set5think in https://github.com/datahub-project/datahub/pull/4388
- feat(search): Add flag to enable caching on search service by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4335
- fix(query_combiner): add try block to handle queries of type str by @WaStCo in https://github.com/datahub-project/datahub/pull/4397
- fix(ingestion): read all tables from redshift by @Abhiram98 in https://github.com/datahub-project/datahub/pull/4345
- fix(ingestion): Invoke SqlLineageSQLParser's implementation in a separate process by @rslanka in https://github.com/datahub-project/datahub/pull/4391
- fix(ingest): handle endpoints without 200 response in openapi by @JorgenEvens in https://github.com/datahub-project/datahub/pull/4332
- feat(ingestion): Add the ability to query the latest timeseries aspect value via the get_cli. by @rslanka in https://github.com/datahub-project/datahub/pull/4395
- Refactoring the quries into a single one to get the search results on Home Page by @Ankit-Keshari-Vituity in https://github.com/datahub-project/datahub/pull/4372
- feat(lineage): hide soft deleted nodes in lineage & adds banner in entity page by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4410
- fix(lineage): Move lineage registry to entity-registry module by @dexter-mh-lee in https://github.com/datahub-project/datahub/pull/4412
- feat(cli) Changes rollback behaviour to apply soft deletes by default by @pedro93 in https://github.com/datahub-project/datahub/pull/4358
- fix(looker): various looker fixes by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4394
- fix(oidc): Fixing OIDC encryption bug in v0.8.29 by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4418
- feat(oidc): Adding support for extracting single string groups claim by @jjoyce0510 in https://github.com/datahub-project/datahub/pull/4419
- fix: change log levels to debug by @anshbansal in https://github.com/datahub-project/datahub/pull/4411
- tests(cypress): reduce cypress flakiness by retrying login on failure by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4423
- fix(ingest): extract redshift platform correctly from sqlalchemy uri by @mayurinehate in https://github.com/datahub-project/datahub/pull/4421
- build: Fix line endings for Windows check-out by @mattmatravers in https://github.com/datahub-project/datahub/pull/4370
- feat(gql): make gql layer resistant to unresolvable relationships by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4424
- fix(ingestion) containers: Adding platform instance to container keys by @treff7es in https://github.com/datahub-project/datahub/pull/4279
- fix: don't set None default by @anshbansal in https://github.com/datahub-project/datahub/pull/4422
- Flexible search on soft delete by @pedro93 in https://github.com/datahub-project/datahub/pull/4405
- fix(no-code metadata models in ui): fixes bug with rendering renderSpec aspects by @gabe-lyons in https://github.com/datahub-project/datahub/pull/4430
New Contributors
- @set5think made their first contribution in https://github.com/datahub-project/datahub/pull/4388
- @Abhiram98 made their first contribution in https://github.com/datahub-project/datahub/pull/4345
- @JorgenEvens made their first contribution in https://github.com/datahub-project/datahub/pull/4332
- @mattmatravers made their first contribution in https://github.com/datahub-project/datahub/pull/4370
Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.29...v0.8.30
DataHub v0.8.29
Released on Thu Mar 10 2022 by @shirshanka.
v0.8.29
NOTICE
This version is affected by an OIDC (SSO) related issue with the following stack trace:
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend Caused by: java.security.InvalidKeyException: Invalid AES key length: 30 bytes
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend at com.sun.crypto.provider.AESCrypt.init(AESCrypt.java:87)
DataHub core team is working to address this. For now, we recommend staying on 0.8.28 if you are using OIDC actively!
Release Highlights
Fix for MAE & MCE consumer healthcheck Upgrade to Java 11 and Gradle 6
Full Commit Log
- #4360 @maaaikoool Add cabify as adopter
- #4365 @dexter-mh-lee fix(vulnerabilities): Fix vulnerabilities in datahub-frontend
- #4361 @jjoyce0510 fix(ui): Supporting unknown data platform type
- #4363 @rslanka feat(ingest): Add memory leak detection capability to the datahub cli command.
- #4366 @RyanHolstien fix(metadata-jobs): fix root context for springboot
- #4340 @leifker feat(build): upgrade to gradle 6 for toolchain to support java 11
- #4357 @anshbansal feat: change quickstart to use head tag for actions
- #4356 @treff7es fix(ingest): bigquery - Fixing missing attribute error if credential was not set
- #4319 @vcs9 feat(ingest): mysql - add database_alias functionality
- #4352 @dexter-mh-lee fix(ci): fix model generation workflow
- #4351 @jjoyce0510 fix(frontend): Fix common OIDC issues
- #4111 @treff7es fix(ingest) bigquery-usage: Adding credential support for bigquery usage
- #4343 @Ankit-Keshari-Vituity Fixed the Small Project issue
- #4350 @MugdhaHardikar-GSLab fix(config-parsing): add support for variable expansion for in variables in between string
- #4330 @anshbansal fix(hive): clean protocol for hive source
- #4338 @maggiehays doc(platforms) adding PowerBI logo to docs website'
- #4342 @anshbansal feat(quickstart): restart actions pod in case of failures
- #4347 @mayurinehate fix(GE): fix dependencies for GE DataHubValidationAction, logic for s…
- #4349 @gabe-lyons query for custom properties on containers
- #4341 @anshbansal fix(doc): remove duplicate entry for permission
- #9 @shirshanka fix(ci): fix datahub jar publish action
- #8 @shirshanka feat(ci): fix jar publish action
- #7 @czbernard fix(ci): fixing tag computation for docker image build
- #6 @shirshanka feat(ci): adding dockerfile and action for datahub-airflow image
- #5 @shirshanka fix(ci): pin python version to 3.9.9 for release action
- #4 @dexter-mh-lee fix(ci): docker-ingestion - update acryl workflow
- #3 @shirshanka fix(pypi): fixing package metadata to reflect source and changelog correctly
- #2 @shirshanka fix(ci): check for tagged reference to kick off pypi push
DataHub v0.8.28
Released on Mon Mar 07 2022 by @shirshanka.
Release Highlights
Notable UI-Based Features
Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.
View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.
User Groups, Policies, and Tags have a new look!
- The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
- We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
- The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).
Notable Metadata Model & Ingestion-Based Features
First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.
Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.
Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.
Miscellaneous Metadata Ingestion Updates:
- Incubating: PowerBI Ingestion Source
- BigQuery Profiling: ability to disable profiling by partition
- Tableau improvements: Workbooks are now modeled as “Containers”
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
- fix(ui): minor ui fixes by @jjoyce0510 in #4325
- fix(lineage): Fix issue where downstream of datajobs do not appear by @dexter-mh-lee in #4326
- fix(ingestion): Insulate 'datahub' and child loggers from external modules. by @rslanka in #4324
- feat(aws-docs): Add section on attaching policies to the datahub-actions pod by @dexter-mh-lee in #4334
- feat(ingest): transformers - add support for processing MCP-s by @swaroopjagadish in #4337
- Allow elasticsearch to authenticate without
username
andpassword
by @salihcaan in #4329 - docs: Update postgres.md by @BoyuanZhangDE in #4292
- fix(ci): wait more for add/remove user test by @gabe-lyons in #4339
- feat(impact analysis): bugfixes for Impact Analysis by @gabe-lyons in #4336
- fix(ingestion): add logging, make job more resilient to errors by @anshbansal in #4331
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
- @salihcaan made their first contribution in #4329
- @BoyuanZhangDE made their first contribution in #4292
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...v0.8.28
DataHub Release Candidate v0.8.28 (rc1)
Released on Sat Mar 05 2022 by @shirshanka.
DataHub v0.8.28 Release Candidate 1
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...v0.8.28rc1
Release Candidate v0.8.28
Released on Sat Mar 05 2022 by @shirshanka.
Release Candidate for Version 0.8.28.
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...RC-v0.8.28
DataHub v0.8.27
Released on Wed Feb 23 2022 by @shirshanka.
Release Highlights
Notable UI-Based Features
The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.
Search for Entities by Owner - Easily filter search results by User/Group Owner
Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!
Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!
Notable Metadata Model & Ingestion-Based Features
ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!
Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!
Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!
BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.
Notable Docs Updates
NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl
Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.
What's Changed
- feat(deprecation): Entity Deprecation Backend by @jjoyce0510 in #4073
- Fixed auto complete pr coments by @Ankit-Keshari-Vituity in #4072
- fix(ingestion): enforce correct behaviour for commit policy by @claudio-benfatto in #4092
- fix(aggregate): Fix NPE in aggregate api by @dexter-mh-lee in #4095
- add Haibo corp by @wangqinghuan in #4082
- fix(ingestion): Add psutil dependency required for stateful ingestion reporting. by @rslanka in #4099
- docs(kafka): add example for using domains, change for clarity by @anshbansal in #4100
- feat(ui): Add display name & title to editable corp user properties. by @jjoyce0510 in #4097
- fix(ingestion): Enhance BigQuery source logging. by @rslanka in #4101
- fix(glossary terms): fix add glossary term flow by @gabe-lyons in #4106
- (docs) Add Zynga & Tableau logos by @maggiehays in #4109
- fix(ingestion): Add sql lineage to redshift-usage plugin by @dexter-mh-lee in #4103
- feat(ui): Add svg datahub satellite loading logo by @eburairu in #4067
- fix(ingestion): resolve oracle issue with large view definitions by @hsheth2 in #4027
- fix(ingest): ignore Postgres information_schema tables by default by @kevinhu in #4069
- fix(ingest) - close event loops in Okta source and add additional debug logging by @aditya-radhakrishnan in #4077
- chore(ingest): remove unused groupby_unsorted utility by @hsheth2 in #4011
- fix(docs): fixing metadata model doc generation script and updating png by @swaroopjagadish in #4120
- fix(ci): fix formatting in doc generation action yaml by @swaroopjagadish in #4121
- fix(ci): fix formatting for action yaml by @swaroopjagadish in #4122
- feat(Tags/Terms): Backend support for tag & term mutations by @jjoyce0510 in #4096
- docs(backup): add doc for taking backup by @anshbansal in #3917
- fix(docs): make intro to metadata ingestion easier for beginners by @anshbansal in #4039
- fix(ingest) Athena: db filter was not applied by @treff7es in #4127
- fix(ui) - move book logo to right of glossary term by @aditya-radhakrishnan in #4125
- fix(docs) Fix doc on modelDocUpload by @daha in #4112
- fix(cypress): force clicks on tag mutation test by @gabe-lyons in #4102
- feat(ingest) Athena: Getting table properties for Athena datasets by @treff7es in #4123
- fix(logging): Fix Restli Logging Filter to print full stack trace on error by @dexter-mh-lee in #4136
- docs : markdown fixes for db retention table by @satyamkrishna in #4133
- docs : markdown fixes for db retention table by @satyamkrishna in #4148
- feat(ingestion): Kafka stateful ingestion by @claudio-benfatto in #4028
- fix(docs): update graphql docs to reference new graphql file by @gabe-lyons in #4139
- Feature/oss/update to v2 endpoints by @RyanHolstien in #4128
- fix(cli): add timeout for telemetry calls by @anshbansal in #4135
- chore(cli): update default cli version pinned in the UI based ingestion by @anshbansal in #4150
- fix(docs): fix example of delta lake by @anshbansal in #4149
- fix(ui): Fix cutoff profiling axis labels by @jjoyce0510 in #4154
- feat(ingest): Glue - Support for domains and containers by @treff7es in #4110
- feat(ui): Host platform images on datahub-web-react by @ngamanda in #4118
- bug(seedData): adds a key to the root user seed data and fixes corner case check for missing key aspects by @RyanHolstien in #4162
- UI Fix: Modal close on Enter press, autofocus on modal, added split panel, alignment of button by @Ankit-Keshari-Vituity in #4155
- feat(ui): Edit glossary term descriptions via UI by @jjoyce0510 in #4156
- Update querying-entities.md -> Documentation Error by @buggythepirate in #4157
- refactor(metadata-io/test): common ElasticsearchContainer and ability to override from environment. by @stephenp-gr in #4152
- feat(ingestion): Add support for snowflake view lineage. by @rslanka in #4163
- Update the doc to including options to include Views by @cuong-pham in #4164
- fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. by @rslanka in #4140
- chore(ingestion): upgrade mypy by @hsheth2 in #4141
- ci(ingestion): fix airflow 1 deps for tox by @hsheth2 in #4083
- fix(ingest) Glue: Removing sqlalchemy dependency from glue by @treff7es in #4168
- fix(ingest) Athena: Generating propert containers for Athena by @treff7es in #4167
- Feature/users and groups UI updated as per new design by @ShubhamThakre in #4134
- chore(docs): various cleanup for docs-website by @hsheth2 in #4143
- bugfix(logging): reduce log noise from authentication chain by @RyanHolstien in #4173
- bug(glossaryTermLabels): fix glossary term labels missing and add cypress test by @RyanHolstien in #4171
- fixes(ui): Misc UI fixes + Adding Owners to Search Filters by @jjoyce0510 in #4175
- BugFixes/user-and-groups-minor-ui-fixes by @ShubhamThakre in #4181
- feat(groups): Adding editable group properties in the backend by @jjoyce0510 in #4166
- fix(python build): Pinning markupsafe by @treff7es in #4188
- feat(analytics): Improve analytics page by adding more charts regarding metadata ingested by @dexter-mh-lee in #4176
- docs(model): auto-generated docs and hand-written docs for the metada… by @swaroopjagadish in #4189
- minor fixes(ui): Small UI display fixes by @jjoyce0510 in #4190
- fix(ui): Return empty search response on invalid characters in search by @jjoyce0510 in #4193
- refactor(spark-lineage): enhance logging and documentation by @MugdhaHardikar-GSLab in #4113
- fix(ui): Correctly display user photo on "list users" screen. by @jjoyce0510 in #4195
- fix(ingest) Snowflake: Handle external S3 bucket lineage for "External Tables". by @rslanka in #4192
- fix(Ingestion): Elastic http/https host support by @abiwill in #4191
- Pinning down elasticsearch to less than 8.0.0 by @pppsunil in #4182
- test(airflow): fix airflow version parsing by @hsheth2 in #4142
- fix(delete): Fixing NPE on delete urns path by @jjoyce0510 in #4197
- docs(website): Move company logos into tabs categorized by industry by @jeffmerrick in #4174
- fix(profile) Bigquery: Setting bigquery temp schema if it is set to fix limit and offset in profiling by @treff7es in #4161
- feat(ingest): record bucketed profiling runtimes by @kevinhu in #4068
- (docs) How to search better with search bar by @xiphl in #4200
- feat(ingest) Bigquery: Ignore temporary tables from lineage and connect edges directly by @treff7es in #4160
- feat(ingest): Not failing on table/view ingestion error by @treff7es in #4185
- fix(ingest): map additional Postgres types by @kevinhu in #4179
- feat(lineage): Add feature for lineage to capture airflow task description. by @guidoturtu in #4147
- add initial clickhouse support by @ne1r0n in #4057
- ci: update rename-namespace.sh to specify /bin/bash by @stephenp-gr in #4153
- fix(ingest): superset - adding missing greenlet dep by @swaroopjagadish in #4203
- fix(docs): fix config typo for stateful ingestion by @jieqiu0630 in #4202
- fix(dbt): dont product key aspects if the entity has no other aspects by @gabe-lyons in #4217
- feat(ingest): Add support for non-default schema registry subject name strategies to the Kafka source by @rslanka in #4215
- fix(ingest): Revert use lower-case dataset names in the dataset urns for all SQL-styled datasets. by @rslanka in #4218
- Add pagination to group ownerships by @mmmeeedddsss in #4199
- feat(Spark-smoke-test): add spark smoke test by @MugdhaHardikar-GSLab in #4158
- feat(ingest): add Python libs for Urns by @tc350981 in #4172
- feat(GraphQL API): Adding group ownership by @jjoyce0510 in #4219
- fix(ui): Wrap homepage cards on long text by @jjoyce0510 in #4220
- fix(docs): add mention of list-runs command in CLI page by @anshbansal in #4210
- fix(ingestion): enable compat with avro 1.11 by @hsheth2 in #4205
New Contributors
- @Ankit-Keshari-Vituity made their first contribution in #4072
- @wangqinghuan made their first contribution in #4082
- @daha made their first contribution in #4112
- @satyamkrishna made their first contribution in #4133
- @ngamanda made their first contribution in #4118
- @buggythepirate made their first contribution in #4157
- @stephenp-gr made their first contribution in #4152
- @cuong-pham made their first contribution in #4164
- @pppsunil made their first contribution in #4182
- @guidoturtu made their first contribution in #4147
- @ne1r0n made their first contribution in #4057
- @jieqiu0630 made their first contribution in #4202
- @mmmeeedddsss made their first contribution in #4199
- @tc350981 made their first contribution in #4172
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.26...v0.8.27
DataHub v0.8.26
Released on Tue Feb 08 2022 by @shirshanka.
This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.
Release Highlights
- Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.
DataHub v0.8.25
Released on Mon Feb 07 2022 by @shirshanka.
Known Issues
- Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.
Release Highlights
Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.
Notable UI-Based Features
- UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
- Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
- Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.
Notable Metadata Model & Ingestion-Based Features
- Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
- Avro files are now supported in the Data Lake File ingestion source
- Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the
datahub
migrate command to migrate them over to platform instances. - Ignore users from Top Users calculation
- BigQuery - Data Profiling on only the latest partition/shard
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
Notable Fixes
- Fix to support
View in Looker
* feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985 - fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
DataHub Usage Guides
- docs(domains): Adding a User Guide for Domains by @jjoyce0510 in #4038
- docs(ingest): Adding UI ingestion guide by @jjoyce0510 in #4048
What's Changed
- fix(vulnerability): Upgrade gms base image by @dexter-mh-lee in #3962
- logging(frontend): Improve OIDC debug logs by @jjoyce0510 in #3967
- docs(delete): add curl request example to delete entity by @anshbansal in #3928
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
- Feature/dynamic platform icons by @RyanHolstien in #3968
- refactor(ingestion): remove duplicate aspect type by @hsheth2 in #3972
- fix(example): fix typo by @anshbansal in #3907
- fix(ingestion): Restrict python to <=3.9.9 by @treff7es in #3961
- feat(build): remove requirement for git directory for builds by @swaroopjagadish in #3977
- fix(ingestion): tighten conditions for restli json transformation by @hsheth2 in #3973
- fix(ingestion): don't dump variables for config errors by @hsheth2 in #3974
- Bugfix/increase socket timeout by @RyanHolstien in #3982
- feat(ingest): support for Avro data lake files by @kevinhu in #3913
- fix(build): exclude old log4j core by @RickardCardell in #3966
- fix(quickstart): Pin Quickstart version to v0.8.23. by @jjoyce0510 in #3983
- feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
- fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(quickstart): Assign correct mysql-setup container for M1 and remove "head" default version. by @jjoyce0510 in #3987
- feat(embedded search results): support custom endpoints in embedded search result by @gabe-lyons in #3986
- fix(docker): datahub-gms - build in native, copy to target by @swaroopjagadish in #3992
- fix(ci): moving defaults back to head now that docker builds are green by @swaroopjagadish in #3993
- feat(ui): UI-based ingestion (as featured in Dec Townhall) by @jjoyce0510 in #3975
- quickstart: Adding UI ingestion to quickstart YAML by @jjoyce0510 in #3994
- feat(domains): Adding backend for Asset Domains (p1) by @jjoyce0510 in #3952
- Bug: a bug fix to bigquery_to_datahub.yml file by @dipeshmaurya in #3988
- fix(ingest): check if feature data type is present by @maaaikoool in #3932
- feat(platform-instance): a simple client-only change to support platf… by @swaroopjagadish in #3996
- docs(metadata-model): Adding to Metadata model docs by @jjoyce0510 in #3998
- Add Stash Logo & new Source Icons by @maggiehays in #4002
- feat(domains): UI for Asset Domains (p2) by @jjoyce0510 in #3995
- docs: add missing back tick for metadata-ingestion/README.md by @nickwu241 in #4003
- Bugfix/add missing classes by @RyanHolstien in #4000
- fix(superset): fix connection for redshift by @anshbansal in #3944
- fix(setup): fix setup for M1 by @anshbansal in #3958
- docs:add Optum logo by @maggiehays in #4005
- Refining Metadata Model docs further by @jjoyce0510 in #4001
- fix(docker): Alpine based multiplatform docker build for kafka-setup by @treff7es in #3991
- Bugfix/graph concurrency issue by @RyanHolstien in #4007
- feat(ingest): Add additional snowflake auth by @MikeSchlosser16 in #4009
- fix(ci): Reverting unnecessary domain test changes by @jjoyce0510 in #4013
- fix(metrics): Add metrics for mcl hooks by @dexter-mh-lee in #4008
- feat(platform) - Update FabricType enum to represent more fabrics by @aditya-radhakrishnan in #3997
- feat(ingest): emit flags and stats for profiling telemetry by @kevinhu in #3969
- fix(formatting): fix linting lib version requirement by @anshbansal in #3939
- fix(docs): fix business glossary docs by @anshbansal in #3916
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(docs): update gms link by @lhvubtqn in #3927
- fix(ingest): lint fix a few files by @swaroopjagadish in #4016
- fix(ingest): adding platform instance urn to data platform instance aspects by @swaroopjagadish in #4015
- feat(ingest): use trino python client for sqlalchemy, supports python… by @mayurinehate in #3888
- fix(spark-lineage): select mock server port dynamically for unit test by @MugdhaHardikar-GSLab in #4018
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
- Test/add concurrency issue smoke test by @RyanHolstien in #4014
- feat(glossary-terms): Index glossary term custom properties by @jjoyce0510 in #3960
- feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in #3735
- Docs/remote deploy and auto render by @RyanHolstien in #4020
- fix(ingest): snowflake - Run authentication validation if default value used by @treff7es in #4024
- feat(nifi): handle provenance api variation for older versions by @mayurinehate in #4022
- feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in #3930
- fix(groups): Fix UI encoding of groups with spaces in urns by @jjoyce0510 in #4021
- fix(text): fix confusing text by @anshbansal in #4025
- fix(clean): add missing cleanup by @anshbansal in #4023
- feat(containers): Backend for Asset Containers (as demo'd in townhall) by @jjoyce0510 in #4019
- fix(docs): Adding Initiate login uri to okta docs (Okta OIDC) by @jjoyce0510 in #4030
- fix: docker-compose now persists kafka broker data by @icy in #4031
- feat(ingestion): Support Kafka confluent external schema resolution by name or subject by @rslanka in #4035
- docs(domains): Adding a User Guide for Domains by @jjoyce0510 in #4038
- feat(Stateful Ingestion-3/3): Client side changes for Monitoring/Reporting by @rslanka in #3807
- feat(containers): Adding Containers UI (as demo'd in Jan Townhall) by @jjoyce0510 in #4037
- feat(users): adding user graphql mutation by @gabe-lyons in #4033
- feat(ingest): add tests for platform instance by @swaroopjagadish in #4047
- feat(model): Data quality model by @ksrinath in #3787
- Bugfix/prevent invalid urn by @RyanHolstien in #4045
- refactor(spark-lineage): remove dependency of spark from McpEmitter by @MugdhaHardikar-GSLab in #4042
- feat(analytics): add more analytics for entities by @anshbansal in #4040
- docs(ingest): Adding UI ingestion guide by @jjoyce0510 in #4048
- fix(mae-consumer-docker): Fix condition for skipping elasticsearch check by @dexter-mh-lee in #4052
- feat(ci): pin tox requirements to speed up ci runs, remove airflow-1 … by @swaroopjagadish in #4055
- feat(container): Add domains aspect to container. by @jjoyce0510 in #4059
- feat(profile) - bigquery: Fix for hitting limit with too many partitioned tables by @treff7es in #4056
- [Docs] Mark data lake metadata source as Beta by @pedro93 in #4061
- feat(ingest): log CLI invocations and completions by @kevinhu in #4062
- fix(ingest): Add aws dependencies for data lake by @kevinhu in #4060
- fix(ingest) - add aws_common as a snowflake_common dependency by @aditya-radhakrishnan in #4054
- feat(ui): Add svg datahub loading logo by @eburairu in #4065
- refactor(models): Refactoring new Assertion models by @jjoyce0510 in #4064
- feat(cli): add --force option to ingest rollback subcommand by @danilopeixoto in #4032
- fix(analytics): fix missing events from UI by @anshbansal in #4026
- Data domain containers ingestion by @treff7es in #4051
- docs(ingestion) glue: document required IAM permissions by @iasoon in #3929
- fix(profile):bigquery - Check for every table if it is partitioned to not hit table quota by @treff7es in #4074
New Contributors
- @dipeshmaurya made their first contribution in #3988
- @maaaikoool made their first contribution in #3932
- @icy made their first contribution in #4031
- @ksrinath made their first contribution in #3787
- @eburairu made their first contribution in #4065
- @danilopeixoto made their first contribution in #4032
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.24...v0.8.25
DataHub v0.8.24
Released on Mon Jan 24 2022 by @shirshanka.
Release Highlights
- Adding support for nested Glue schemas
- Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
- Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
- Miscellaneous bug fixes & improvements
What's Changed
- fix(workflow) docker-ingestion is failing bc of an invalid sed command by @dexter-mh-lee in #3896
- refactor(graphql): Migrating Datasets, Charts, Dashboards, Jobs, Flows to Entity V2 endpoint by @jjoyce0510 in #3897
- fix(ingest): populate system metadata for all metadata events (mcp, mcpw) by @swaroopjagadish in #3900
- perf: add/change scripts for tests by @anshbansal in #3840
- fix(glossary): owner should be optional as per docs by @anshbansal in #3858
- feat(ingestion): Support for nested glue schemas by @rslanka in #3895
- docs: change roadmap link by @jeffmerrick in #3904
- feat(kafka): support confluent references by @anshbansal in #3862
- docs (elasticsearch): config error by @Jiwei0 in #3901
- feat(ingestion): Data lake profiling by @kevinhu in #3656
- refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable by @senni0418 in #3870
- fix(ingest): nifi - replace hardcode password with config variable by @lhvubtqn in #3902
- feat(authentication): propagate expired token exceptions to end user by @gabe-lyons in #3894
- fix(docs): update data lake docs with path_spec details by @kevinhu in #3905
- ci(smoke-test): make tags&terms smoke test wait for ingestion to complete by @gabe-lyons in #3812
- Revert "fix(glossary): owner should be optional as per docs (#3858)" by @anshbansal in #3910
- fix(ingest): operational stats - check if optional fields are present by @aditya-radhakrishnan in #3911
- fix(typo): fix typo in docs by @anshbansal in #3908
- refactor(gql/ui): Misc refactorings by @jjoyce0510 in #3921
- feat(config): make check for frontend instead of gms more robust by @anshbansal in #3919
- feat(spark-lineage): simplified jars, config, auto publish to maven by @swaroopjagadish in #3924
- Bugfix/telemetry soft fail by @RyanHolstien in #3934
- fix(log): fix log levels and formats by @anshbansal in #3943
- docs(metadata-ingestion): fix command for running fast unit tests by @anshbansal in #3942
- fix(ui): update login title css to fit on one line by @aditya-radhakrishnan in #3922
- fix(docs): Clarify available no-code rendering formats in DataQualityRules.pdl by @gabe-lyons in #3912
- docs(links): add links to some recent case studies and blog posts by @anshbansal in #3941
- fix(docs): fix openapi docs by @anshbansal in #3940
- Adding Snappy Lib and JKS File by @arunvasudevan in #3898
- Feature/Issue resolved- Improve table stats readability in UI by @ShubhamThakre in #3889
- refactor(ui): Allow DocumentationTab to optionally use updateDescription mutation by @jjoyce0510 in #3935
- (docs)add moloco logo by @maggiehays in #3945
- refactor(bootstrap data): Add usage and profiles to bootstrap_mce.json by @jjoyce0510 in #3947
- docs(metadata): update relationship query in docs by @gabe-lyons in #3951
- fix(ingestion): Snowflake Usage should continue to emit usage workunits with include_operational_stats enabled. by @rslanka in #3949
- feat(ingestion): Add support for extracting S3->Snowflake and S3->Glue lineages. by @rslanka in #3946
- fix(graphQL): Fixing set ordering in batchGet of entities by @jjoyce0510 in #3950
- feat(elastic-search): changing default bulk index request batch to 1000 by @swaroopjagadish in #3957
- docs (metadata modeling): Fix broken links and doc fixes by @arunvasudevan in #3954
New Contributors
- @Jiwei0 made their first contribution in #3901
- @senni0418 made their first contribution in #3870
- @lhvubtqn made their first contribution in #3902
- @ShubhamThakre made their first contribution in #3889
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.23...v0.8.24
DataHub v0.8.23
Released on Fri Jan 14 2022 by @shirshanka.
Release Highlights
- Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
- Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
- Robustness improvements to DataHub Java Client Package
- Introducing a new Elasticsearch ingestion connector!
- Misc bug fixes & improvements.
What's Changed
- build: include correct version in metadata-ingestion docker image by @hsheth2 in #3857
- fix(metabase): fix crashes on missing values by @iasoon in #3859
- fix(datahub-client): fix shadow jar build, correct spark-lineage url … by @swaroopjagadish in #3871
- feat(git-version): Add version to the UI and config endpoint by @dexter-mh-lee in #3866
- fix(build): fix shadow jar checker to allow new git.properties by @swaroopjagadish in #3875
- feat(metadata-ingestion): Make datahub-rest client more robust by configurable retries. (#3826) by @RickardCardell in #3860
- fix(github-workflow): Remove duplicate context in kafka setup workflow by @dexter-mh-lee in #3876
- docs(azure-ad): correct default value for username attr by @iasoon in #3861
- docs: fix endpoint URL by @anshbansal in #3852
- fix(cli): disable telemetry in CLI tests by @kevinhu in #3877
- feat(metabase): allow configuring how database engines get mapped to platforms by @iasoon in #3869
- doc(graphql): add some examples by @anshbansal in #3867
- fix(search): Fix issue with filters and autocomplete by @dexter-mh-lee in #3868
- fix(build): remove jcenter from gradle build by @aditya-radhakrishnan in #3882
- (docs)Roadmap, Townhall, & Feature Request link updates by @maggiehays in #3873
- doc(kafka): add permissions required for confluent cloud by @anshbansal in #3850
- feat(ingest): ingestion-specific telemetry by @kevinhu in #3881
- Add AWS MSK Iam Auth Jar to GMS by @arunvasudevan in #3872
- docs(ingestion) azure: specify required permission type by @iasoon in #3886
- feat(ingestion) dbt: support spark sql types by @iasoon in #3880
- update dependency for bigquery. by @varunbharill in #3874
- fix(field-extraction): Fix extraction for unions by @dexter-mh-lee in #3892
- fix(ingest): sqlparser - Not lowercasing looker source's special table name by @treff7es in #3891
- feat(ingest): Support for spectrum external array types by @treff7es in #3890
- feat(Ingestion): Add Elasticsearch Source by @rslanka in #3893
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.22...v0.8.23
DataHub v0.8.22
Released on Sat Jan 08 2022 by @shirshanka.
Disclaimers!
- Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.
Release Highlights:
- Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
- Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
- Data freshness indication via Last Updated Timestamp.
- Improvements to data profiling performance and lineage extraction
What's Changed
- feat(snowflake-usage): Generate email address if not exists by @treff7es in #3791
- feat(java datahub-client): add Java REST emitter by @MugdhaHardikar-GSLab in #3781
- fix(docker): Fix path to elastic definition in dev docker compose by @MikeSchlosser16 in #3808
- feat(nocode): Add get entities v2 endpoint that can get without snapshot by @dexter-mh-lee in #3738
- docs(modeling): Add a link to MXE page inside the Metadata Modeling page by @pramodbiligiri in #3765
- docs(fix): fix broken reference by @RyanHolstien in #3814
- feat(java-emitter): improvements to builder API-s, moving spark-linea… by @swaroopjagadish in #3819
- fix(ingestion): Make url an optional field of the DefaultConfig for business glossary by @rslanka in #3817
- fix(ingest): Handle string redshift type by @treff7es in #3811
- feat(gms): add schema registry support for tls in gms by @MikeSchlosser16 in #3804
- Add table, changed formatting and wording by @dannylee8 in #3802
- feat(mae/mcl): Make ingestAspect produce both MCLs and MAEs by @dexter-mh-lee in #3737
- docs(confluent): Add new topic names by @anshbansal in #3825
- (feat)(glossary): Increase number of autocomplete results shown to 25 by @aditya-radhakrishnan in #3821
- feat(sql-parser): Replacing sqlmetadata sql parser lib with sqlineage parser lib by @treff7es in #3806
- feat(profiler): using approximate queries for profiling by @treff7es in #3752
- docs: improve docs for kafka configuration by @abiwill in #3828
- test(fixEbeanEntityServiceTest): fix bug on verification for EbeanEntityService by @RyanHolstien in #3829
- fix(ingest): ignore custom connectors for Glue ingestion by @kevinhu in #3805
- fix(java-emitter): check for null callback by @swaroopjagadish in #3830
- feat(dbt-meta): add support for dbt meta mapping by @swaroopjagadish in #3832
- fix(ingestion): Fix the datetime parsing issue in the metabase source. by @rslanka in #3831
- feat(removeGMA): remove all dependencies on gma libraries by @RyanHolstien in #3835
- perf(ingest): changes to improve ingest performance a bit by @anshbansal in #3837
- fix(azure AD): fix problem with missing key causing failures in ingestion by @anshbansal in #3824
- docs: fix typo by @anshbansal in #3848
- docs(cli): fix wrong heading, add link to release notes by @anshbansal in #3700
- feat(ci): split metadata-ingestion ci to streamline build by @swaroopjagadish in #3854
- fix(dbt): fix warning due to struct type not being mapped by @anshbansal in #3846
- fix(ingest): bigquery-usage - fix remove_extras to remove all partitions by @gfalcone in #3842
- fix(ingestion): handle database=None for dbt ingestion by @iasoon in #3851
- feat(ingest): last updated - show last updated for sql usage sources by @aditya-radhakrishnan in #3845
- feat(lineage): allow for expanding of lineage node titles in the lineage explorer by @gabe-lyons in #3856
New Contributors
- @MikeSchlosser16 made their first contribution in #3808
- @pramodbiligiri made their first contribution in #3765
- @aditya-radhakrishnan made their first contribution in #3821
- @abiwill made their first contribution in #3828
- @gfalcone made their first contribution in #3842
- @iasoon made their first contribution in #3851
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.21...v0.8.22
v0.8.21
Released on Tue Dec 28 2021 by @shirshanka.
This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.
Release Highlights
- Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
- Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
- Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.
What's Changed
- fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
- fix(react-ui): fix header min height by @gabe-lyons in #3784
- docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
- Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
- feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
- feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
- Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
- docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
- fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
- doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
- feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
- fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
- docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789
New Contributors
- @cccs-eric made their first contribution in #3780
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.20...v0.8.21
v0.8.20
Released on Mon Dec 20 2021 by @shirshanka.
This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.
Release Highlights
- Configurable aspect retention in application.yml (disabled by default)
- Metabase Ingestion Source connector
- Constrain log4j to version 0.2.17
- Upgrade logback to 1.2.9
What's Changed
- feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
- feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
- feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
- feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
- feat(ingest): cleanup deprecated
datahub.integrations.airflow.*
imports by @hsheth2 in #3732 - feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
- fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
- feat(perf-test): changes for perf testing by @anshbansal in #3728
- ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
- (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
- Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
- fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
- fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
- fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
- feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
- fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
- refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
- test(ingest): fix pytest warning for class starting with
Test
by @hsheth2 in #3745 - feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
- fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
- feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
- Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
- build(ingest): restrict latest mypy version by @hsheth2 in #3756
- doc: Add IOMED as a DataHub adopter by @merqurio in #3758
- docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
- feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
- feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
- feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
- refactor(test): replace
CliRunner
withrun_datahub_cmd
method by @hsheth2 in #3746 - feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
- feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
- Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
- fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
- docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
- docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
- Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
- fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
- fix(ingest): fix compatibility with google composer by @anshbansal in #3774
Known Issues
We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.
New Contributors
- @MugdhaHardikar-GSLab made their first contribution in #3664
- @jawadqu made their first contribution in #3602
- @nsbala-tw made their first contribution in #3733
- @merqurio made their first contribution in #3758
- @hyunminch made their first contribution in #3680
- @sudotty made their first contribution in #3227
- @xiphl made their first contribution in #3772
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.19...v0.8.20
v0.8.19
Released on Mon Dec 13 2021 by @shirshanka.
This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.
Release Highlights
- Fix
base64
cli command issue where some systems do not have it. - Fix usage user extraction where email domain repeated twice.
What's Changed
- fix(recommendations): don't show a
0
character when there are no suggestions by @gabe-lyons in #3720 - fix(mode): support definitions in mode query by @gabe-lyons in #3721
- fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
- docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
- fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
- fix(ingest): get mysql geotypes properly by @treff7es in #3726
- fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
- feat(ingest) Trim long sql queries in usage by @treff7es in #3725
- fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
- fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
- fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
- feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
- fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714
New Contributors
- @lvicentesanchez made their first contribution in #3702
- @grumbler made their first contribution in #3714
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.18...v0.8.19
v0.8.18
Released on Fri Dec 10 2021 by @shirshanka.
DataHub Release 0.8.18 is here!
Release Highlights
Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)
Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3
Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.
Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.
Add Aspects without a fork: This is a major milestone towards No-Code UI
- Watch the No Code UI Sneak Peek
Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)
Bug Fixes:
- [metadata service] Empty search query fails to resolve
- [metadata service][Log4j vulnerability](https://www.lunasec.io/docs/blog/log4j-zero-day/) addressed!! Highly recommend folks to upgrade to latest.
- [metadata ingestion][bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
- [metadata-service][recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
- [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
- [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
- [ui] Fix Groups page not showing asset ownership correctly
- [ui] Fix issue where markdown links were not clickable.
- [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
- [cli] Fix deletes by search cannot accept auth token
- [metadata service][policies] Fix invalid Tag creation policy
- [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade
Backwards Incompatible Changes
- The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)
New Contributors
- @robscriva made their first contribution in #3600
- @adriangb made their first contribution in #3582
- @bartlomiejolma made their first contribution in #3650
- @anshbansal made their first contribution in #3653
- @ecooklin made their first contribution in #3657
What's Changed
- style(react-app): add default monospace font to font-family by @robscriva in #3600
- feat(boot): Ingest datahub root user info on boot by @jjoyce0510 in #3603
- [refactor] - Remove GMS GraphQL Service by @arunvasudevan in #3605
- feat(auth): Metadata Service Authentication! by @jjoyce0510 in #3598
- docs:remove hubspot form and instead link to acryldata.io by @jeffmerrick in #3488
- fix(docs): Move transformers to be under metadata ingestion by @aseembansal-gogo in #3591
- fix(bigquery-usage): Fix filters and event joining logic. by @varunbharill in #3610
- feat(cli): adding a put command and docs by @swaroopjagadish in #3614
- feat(elastic): adding es logo by @gabe-lyons in #3611
- feat(profiler): dynamically combine queries by @hsheth2 in #3572
- doc(components): Adding DataHub components overview by @jjoyce0510 in #3606
- fix(java client): Fix Profiling NPE + misc improvements by @jjoyce0510 in #3621
- fix(docs-website): fix incorrect managed url by @jeffmerrick in #3618
- fix(ingest): rectify platform urn in kafka connect source by @mayurinehate in #3624
- docs(okta): Added Okta Logout Settings by @serefacet in #3627
- fix(search): Fix issue when query is empty by @dexter-mh-lee in #3620
- fix(redshift-usage): Add docs for redshift usage ingestion. by @varunbharill in #3617
- fix(ci): pin great expectations version by @swaroopjagadish in #3629
- fix(delete): Remove logic that adds an invalid filter for platform field by @dexter-mh-lee in #3619
- feat(metadata-service): support for custom model extensions without forks by @shirshanka in #3630
- fix(kafka-producer): fix debug logging by @claudio-benfatto in #3626
- fix(tests): fix typo in test name by @adriangb in #3582
- feat(cfg): Add configurable GCP log page size by @jjoyce0510 in #3556
- fix(recommendations): Fix issue with recently viewed and most popular recs not showing up by @dexter-mh-lee in #3631
- fix(ingestion): Add config to specify ca certificate path for datahub-rest sink by @dexter-mh-lee in #3632
- fix(ingest): workaround great-expectations compatibility issue by @hsheth2 in #3634
- fix(ingestion): Handling for special characters in snowflake databases and schemas. by @rslanka in #3635
- fix(group ownership): Fixing Groups Profile ownership by @jjoyce0510 in #3638
- feat(autorender): Auto render aspects that don't have frontend components in the UI by @gabe-lyons in #3597
- docs(business glossary): document the business glossary file format by @gabe-lyons in #3639
- fix(ingestion): Enhance supported and unsupported base_objects_accessed for Snowflake Usage by @rslanka in #3608
- feat(quickstart): Simplify docker generate and compare script by @EnricoMi in #3434
- fix(docs): small fixes to docs and docker images for custom metadata … by @swaroopjagadish in #3640
- fix(mongodb): enable version check for document size filter. by @varunbharill in #3644
- docs: Update to DataHub Adopter logos & Townhall details by @maggiehays in #3648
- feat(build): adds support for incremental build in ingestion by @swaroopjagadish in #3647
- fix(description): fix issue where markdown links are unclickable by @gabe-lyons in #3646
- fix(schema): fix bug where key/value toggle would appear on schema tabs with no fields by @gabe-lyons in #3643
- feat(build): Preflight script for metadata ingestion setup on m1 by @treff7es in #3652
- docs(graphql) Adding additional GraphQL docs by @jjoyce0510 in #3649
- docs: correct title of postgres gms by @bartlomiejolma in #3650
- fix(cli): fix for deletion cli by @anshbansal in #3653
- fix(metadata-io) Adds docker engine configuration checks before running docker-based tests by @pedro93 in #3654
- fix(model): Remove unused PDL from pre-nocode days by @dexter-mh-lee in #3659
- fix(docs): fix docs build on m1 by @anshbansal in #3662
- feat(ingest): add --strict-warnings option by @hsheth2 in #3665
- fix(search): Improve search and recs performance by @dexter-mh-lee in #3660
- feat(metadata-model): adding metadata model doc generation and upload… by @swaroopjagadish in #3667
- fix(ingestion): black formatting by @hsheth2 in #3676
- fix(metadata-ingestion): fix requirements for m1 preflight checks by @gabe-lyons in #3677
- fix(kafka): Add back changes to centralize kafka config by @dexter-mh-lee in #3675
- feat(ingestion): anonymous usage stats by @kevinhu in #3668
- docs(scheduling): re-arrange docs related to scheduling, lineage, CLI by @anshbansal in #3669
- feat(delete): support deleting by search w/ tokens by @gabe-lyons in #3684
- docs: change roadmap link in docs by @jeffmerrick in #3685
- docs(business glossary): fix specification of the file by @anshbansal in #3679
- refactor(profiling): clean up SQL query analysis by @hsheth2 in #3674
- fix(tags): Fixing Tag Create Privileges (issue #3609) by @jjoyce0510 in #3683
- fix(elasticsearch): Use auth tokens to authorize curl requests in dockerize by @dexter-mh-lee in #3596
- fix(snowflake): support geo types by @gabe-lyons in #3686
- feat(profiler): add query combiner report statistics by @hsheth2 in #3678
- feat(transformer) Adds glossary terms transformer by @ecooklin in #3657
- fix(deletes): Fixing system metadata index deletes by @jjoyce0510 in #3693
- feat(ingest): add nifi source in metadata-ingestion by @mayurinehate in #3681
- feat(bigquery): support snapshot and partition tables in bigquery ingest & lineage by @gabe-lyons in #3695
- fix(ingest): refactor urn deletion by @kevinhu in #3694
- fix(perf-test): fix for M1 by @anshbansal in #3689
- fix(bootstrap): revert accidental change to file_to_datahub_rest.yml by @gabe-lyons in #3698
- feat(ingestion): Add lineage support for Redshift source by @gabe-lyons in #3697
- fix(ingestion): Disable query parser failure reporting to Datahub in redshift lineage by default by @treff7es in #3699
- docs(airflow): add some troubleshooting for error by @anshbansal in #3687
- docs(redshift): Adding requirements for redshift permissions by @treff7es in #3707
- fix(nifi): add env in nifi config, add unit tests, fix nifi doc by @mayurinehate in #3703
- feat(mode): add mode analytics ingestion source by @gabe-lyons in #3710
- fix(url encoding): also encode square brackets by @gabe-lyons in #3709
- fix(datahub-upgrade): Fix Spring injection issue with datahub-upgrade by @dexter-mh-lee in #3688
- docs(guide): add example for adding user in DataHub by @anshbansal in #3682
- fix(home): Change docs count to not count removed datasets by @dexter-mh-lee in #3711
- Fix CVE-2021-44228 by @frsann in #3716
- *Full Changelog**: https://github.com/linkedin/datahub/compare/v0.8.17...v0.8.18
v0.8.17
Released on Fri Nov 19 2021 by @shirshanka.
Notable Changes
- Added Recommendations and redesigned the home page!
- Modular way to add recommendations throughout the application
- Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
- Search page also has top tags/terms module on the bottom
- Ingestion Sources
- DBT enhancements
- Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
- OpenAPI specs
- Kafka Connect (Regex based transformers, BigQuery sink)
- Trino Usage (Starburst)
- DBT enhancements
- Improved lineage viz performance and lineage viz UX
- Improved layout logic
- Nodes can be dragged and dropped
- Fixes for delete API not always deleting all of an entities data
- Improved documentation for adding a custom Metadata Ingestion Source
- Fixes description rendering for Charts, Dashboards, Flows, Jobs
- Add YAML configuration file for Metadata Service
- Filter search results by Sub-Type (Looker Explore, View, etc)
- Support proxying DataHub Frontend requests to Metadata Service at
/api/gms
- Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
- Graph Service: DGraph support (phase 1)
What's Changed
- fix(docs): fix image paths and company logo link by @jeffmerrick in #3435
- feat(docs-site): two small tweaks by @gabe-lyons in #3437
- feat(ingestion): support custom properties to be ingested via business glossary yaml by @gabe-lyons in #3438
- fix(restli entity client): fix case where sortCriterion is null by @gabe-lyons in #3436
- feat(lineage): improved lineage performance + simplified layout logic + some easter eggs by @gabe-lyons in #3357
- docs(metamodel): added DataHub's metadata model diagram by @swaroopjagadish in #3449
- fix(tag+terms): improved error messaging & rules on tag + term mutations by @gabe-lyons in #3448
- fix(browse): disable breadcrumb links on non-browsable entities by @gabe-lyons in #3447
- fix(ingest): fix lookml derived tables parsing by @remisalmon in #3443
- docs(docs-site): small nits for docs site homepage by @gabe-lyons in #3444
- perf(ingest): lazy load ingestion plugins by @hsheth2 in #3430
- Fix docs website by @jeffmerrick in #3446
- fix(restore): Fix restore backup jobs by @dexter-mh-lee in #3445
- fix(ingest): lineage for Airflow subdags by @kevinhu in #3351
- docs: Update to Q3 2021 accomplishments by @maggiehays in #3420
- fix(bigquery): Add gcp logging dependency for bigquery source. by @varunbharill in #3451
- build(frontend): unzip depend on yarnBuild by @gabe-lyons in #3452
- feat(react): add handy webpack analyze command by @gabe-lyons in #3454
- test(CI): show test results on GitHub by @EnricoMi in #3362
- docs(transformers): add exemple of custom tag function by @WaStCo in #3354
- docs: add guide for using custom sources by @DSchmidtDev in #3324
- feat(dbt-ingestion): added possibility to skip specific models by @AndreasTA-AW in #3340
- fix(mongodb): Support filtering mongodb documents as per size. by @varunbharill in #3456
- fix(mysql): Update default mysql collation to utf8mb4_bin by @jjoyce0510 in #3459
- fix(ingestion): Workaround for Python 3.8/3.9 mypy invalid syntax issue with airflow 2.2.0 by @rslanka in #3460
- fix(ui): Fixing UI User + Group display name by @jjoyce0510 in #3461
- fix(react): fix up
yarn test
error reporting by @gabe-lyons in #3462 - docs(frontend): remove confusing suggestion to manually create users by @gabe-lyons in #3465
- docs: Overhaul of DataHub Features page by @maggiehays in #3439
- docs: Update TownHall Agenda and TownHall History by @maggiehays in #3463
- fix(tags): fix links to tags when there are special chars in the urls by @gabe-lyons in #3464
- fix(CI): Stabalize gradle build by @EnricoMi in #3413
- docs: update next Townhall date in README.md by @maggiehays in #3466
- perf(react bundle): decrease bundle size by 15% by @gabe-lyons in #3468
- fix(graphql): fixing Graphql engine factory when analytics are disabled by @gabe-lyons in #3467
- feat(recommendations): Recommendations infra P1 by @jjoyce0510 in #3455
- refactor(styling): Improving recommendation Tag / Search query list styling by @jjoyce0510 in #3472
- fix(docs): fix transformer doc example by @aseembansal-gogo in #3469
- fix(ingest): redshift source gets external table types properly by @treff7es in #3371
- fix(recs): Remove removed entities from aggregation by @dexter-mh-lee in #3473
- fix(ui): fix double formatting of entity count on home page by @jjoyce0510 in #3474
- fix(subtypes): fix case where subtypes are not being fetched for leaf datasets by @gabe-lyons in #3476
- feat(ingestion): User configurable dataset profiling. by @rslanka in #3453
- styling(ui): improve tag list, glossary term list recommendation styling by @jjoyce0510 in #3475
- feat(ui): Provide filtering capability for Sub Types inside the UI by @jjoyce0510 in #3479
- fix(ingest): correctly support multiple snowflake databases by @hsheth2 in #3482
- fix(datajobs): fetch dataflow properties from a relationship by @gabe-lyons in #3487
- fix(fk): fix schemaField urn construction in foreign keys by @gabe-lyons in #3486
- fix(fk): trim whitespace from fk constraints in the case the fieldspec has leading or trailing whitespace characters by @gabe-lyons in #3485
- feat(dbt): add dbt logo and platform. by @varunbharill in #3483
- feat(lineage): some ux improvements to lineage interactions by @gabe-lyons in #3478
- refactor(nocode): Final part of No-Code cleanup by @jjoyce0510 in #3477
- fix(browse paths): Adjust Default browse path logic for datasets by @jjoyce0510 in #3495
- fix(lineage backend): fix ownership timestamps by @gabe-lyons in #3498
- tests(smoke): introducing first isolated smoke test: updating tags & terms by @gabe-lyons in #3496
- feat(graphql): extend entity client to support aspect methods directly via java by @gabe-lyons in #3489
- fix(aspects): fix null aspects case by @gabe-lyons in #3501
- Docs: Update to Slack & Townhall details by @maggiehays in #3502
- refactor(profiler): add PerfTimer class and fix typos by @hsheth2 in #3497
- fix tiny typo by @andrewm4894 in #3484
- fix(ingestion): Glue job names by @kevinhu in #3503
- fix(fk): fix foreign key styling with modals by @gabe-lyons in #3500
- docs: add path fix for 'command not found' by @dannylee8 in #3490
- docs: nit, grammar by @dannylee8 in #3491
- docs: nit by @dannylee8 in #3492
- Docs: nits by @dannylee8 in #3493
- add tooltip for owner category in dataset profile page by @saxo-lalrishav in #3470
- feat(ingest) : kafka connect source improvements by @mayurinehate in #3481
- feat(ingest): adding support for read-modify-write capabilities durin… by @swaroopjagadish in #3506
- feat(dbt): Dbt enhancements - dbt nodes, lineage, subtype, etc. by @varunbharill in #3519
- docs (Metadata Model): nits by @dannylee8 in #3525
- fix(ingestion): Enhance logging and error-handling in bigquery usage connector. by @rslanka in #3521
- docs: nits and added hyperlinks by @dannylee8 in #3526
- (Docs) Updated troubleshooting tip by @dannylee8 in #3516
- test(profiler): make profiling tests more comprehensive by @hsheth2 in #3513
- doc (React) Node 17 openssl change causes error by @dannylee8 in #3523
- feat(cli): add support for deletion based on filters, soft deletes an… by @swaroopjagadish in #3527
- feat(frontend): Proxy GMS API requests by @jjoyce0510 in #3509
- feat(deletes): support deletion of non-snapshot aspects by @gabe-lyons in #3518
- fix(ingest): restrict botocore version to fix urllib3 build issue by @hsheth2 in #3534
- ui: Migrate UI to use "properties" field of entity for descriptions by @jjoyce0510 in #3515
- fix(cli): fix name of cli arg by @aseembansal-gogo in #3536
- feat(git-version): Encode the latest release included in the build by @dexter-mh-lee in #3535
- Revert "feat(git-version): Encode the latest release included in the build" by @dexter-mh-lee in #3539
- [feat] Add multiplatform docker support by @treff7es in #3537
- feat(gms): Adding yaml configuration for metadata-service by @jjoyce0510 in #3514
- fix(docker): fix multi-platform build for arm by @treff7es in #3543
- fix(glossary terms): Fix 'Glossary Term on Dataset Columns doesn't show in Related Entities' by @jjoyce0510 in #3542
- fix(ingestion): Make AVRO schema parsing robust to exceptions. by @rslanka in #3541
- docs: cleanup old links, misc updates by @swaroopjagadish in #3545
- docs(graphql): Updating example CURLs to work on copy and paste + misc FAQ doc improvements by @jjoyce0510 in #3538
- feat(cli): add support for m1 laptops during quickstart by @swaroopjagadish in #3547
- feat(ingestion): Support for converting AVRO schemas with logical types to MCE fields. by @rslanka in #3546
- feat(profiler): streamline profiler by @hsheth2 in #3510
- feat(ingest): add transformer to add properties by @nomarlo in #3480
- Adding openapi ingestion by @vlavorini in #2706
- [fix] Disabling arm frontend build temporary by @treff7es in #3551
- fix
com.linkedin.dashboard.TagProperties
by @andrewm4894 in #3550 - [fix] Build frontend docker on build platform instead on target platform by @treff7es in #3552
- fix(docker): create multiplatform docker build from elasticsearch-setup by @treff7es in #3562
- fix(docs-website): Fix company logo urls by @jjoyce0510 in #3568
- fix(frontend): Hush noisy datahub-frontend warnings by @jjoyce0510 in #3559
- fix(docs): use absolute links by @swaroopjagadish in #3570
- docs: Adding a custom Data Platform doc by @jjoyce0510 in #3561
- fix: Glue lineage compatibility by @kevinhu in #3555
- fix(bigquery_usage ingestion): add partition decorator to regex, move exception handling to after matching, add table snapshots by @courtney-lang in #3533
- fix(ingest): fix bigquery-usage regex for both partitioned and sharde… by @swaroopjagadish in #3571
- fix(ingestion, redshift-usage): Do not append email domain to the username if the username is already an email. by @rslanka in #3569
- docs: adding links to help with metadata model visualization and documentation by @swaroopjagadish in #3573
- feat(GraphService): Add Dgraph implementation of GraphService by @EnricoMi in #3261
- feat(ingest): adding snowflake app name to connection string by @swaroopjagadish in #3574
- Run metadata-io tests in parallel by @EnricoMi in #3577
- fix(users): fix ownership count on user page by @gabe-lyons in #3575
- fix(graphql): making glossaryTermInfo nullable in glossaryTerm. by @varunbharill in #3576
- fix(autocomplete): fix case where autocomplete interferes with search by @gabe-lyons in #3580
- fix(cli): m1 check breaks on windows by @swaroopjagadish in #3579
- Feat: Allow logs to be filtered in Bigquery Usage plugin by @tha23rd in #3567
- fix(ingestion): Fix snowflake documentation. by @rslanka in #3585
- feat(ci): adding support for env variables in python release script by @swaroopjagadish in #3587
- fix(docs): Add documentation on how to connect to custom ES instance. by @varunbharill in #3581
- fix(ci): SKIP_RELEASE_UPLOAD flag was not being respected by python r… by @swaroopjagadish in #3588
- feat(ingestion): Adds --dry-run and --preview options to datahub ingest command. by @rslanka in #3584
- fix(ingest): fix dbt source platform when disable_dbt_node_creation is False by @remisalmon in #3592
- Add docs nav links by @jeffmerrick in #3594
- feat(ingest): add bigquery sink connector lineage in kafka connect source by @mayurinehate in #3590
- feat(model): adding a field to capture unmodeled field level properties by @swaroopjagadish in #3593
- fix(browse): Fix browse response size issue when there are too many browse paths by @dexter-mh-lee in #3595
- fix(docs): Add docs on accessing datahub CLI by @aseembansal-gogo in #3589
- feat(ingest): Starburst Trino usage by @treff7es in #3558
- fix(ingestion): Emitter api examples + Documentation by @rslanka in #3599
New Contributors
- @maggiehays made their first contribution in #3420
- @WaStCo made their first contribution in #3354
- @DSchmidtDev made their first contribution in #3324
- @andrewm4894 made their first contribution in #3484
- @dannylee8 made their first contribution in #3490
- @nomarlo made their first contribution in #3480
- @courtney-lang made their first contribution in #3533
- @tha23rd made their first contribution in #3567
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.16...v0.8.17
DataHub v0.8.16
Released on Thu Oct 21 2021 by @shirshanka.
Release Highlights
- Important bug-fixes:
properties
for DataJob and DataFlow,descriptions
for Datasets should now correctly show in the UI - Search redesign! Single search experience across all entity types with left filter bar
- Added searchAcrossEntities endpoint on both GraphQL and Rest.li that pulls search results for all entity types and mixes them together
- Dataset level lineages - Added support for ingesting dataset level lineages for bigquery. Added support for linking external tables in redshift to the corresponding table in the external data catalog.
- Performance optimization: graphql will now directly call the entity service instead of calling the entity resource over http to hydrate graphql models.
- The “filter” input model used for “search” API now supports disjunctive normal form. (OR of ANDs). The previous filter model should continue to work as expected. (criteria array)
- Adding foundations (models) for search insights, or highlights shown in the search result previews.
- Add owner experience improvements: using full text search to find users and groups.
- User & Group Management Screens!
- View all users (and those who have logged in)
- View all groups
- Create new groups
- Add and remove group members
Breaking Changes
None
What's Changed
- feat(ui): Improve add owner search experience by @jjoyce0510 in #3306
- (fix) Set ebean transaction level to be repeatable read by @xdl in #3285
- fix(fonts): fix manrope styling by @gabe-lyons in #3311
- docs(datahub-frontend): add build instructions for the datahub-frontend docker image by @thebouv in #3314
- feat(ingest): support for primary and foreign key extraction from sql sources by @swaroopjagadish in #3316
- feat(transform): adds replace_existing config to set_dataset_browse_path by @sgomezvillamor in #3313
- feat(redshift): added ability to extract external schema from Redshift spectrum by @varunbharill in #3321
- fix(docs): patch link to Airflow Docker compose file by @kevinhu in #3322
- docs: Fix topic_pattern typo in kafka ingestion docs by @serefacet in #3317
- fix(graphql): add ElasticSearch path prefix configuration by @zhoxie-cisco in #3297
- fix(ingest): more robust error handling in lookml sql parsing by @swaroopjagadish in #3325
- fix(ingest): Fix sasl exception for hive ingestion by @serefacet in #3326
- fix(ingest): no error when there are no partition keys by @aseembansal-gogo in #3328
- fix(docs): fix graphql deprecated comment by @gabe-lyons in #3327
- feat(dbt-ingestion): added tags and owner from dbt by @AndreasTA-AW in #3270
- fix(oidc): Tolerate null emails by @jjoyce0510 in #3330
- feat(Snowflake Lineage Ingestion) by @rslanka in #3331
- feat(ingest): support user group filtering for Azure AD by @vlavorini in #3312
- feat(ingest): Redash add parse_table_names_from_sql feature and multiple refactor by @taufiqibrahim in #3267
- feat(ingest): add support for github and looker links in looker views… by @swaroopjagadish in #3332
- fix(git-ignore): Git ignore generated python and avro artifacts by @dexter-mh-lee in #3320
- fix(ingestion): make dbt tag prefix configurable by @remisalmon in #3334
- feat(ingest): add trino source in metadata-ingestion by @mayurinehate in #3307
- feat(ingestion): support Airflow cluster config by @hsheth2 in #3336
- feat: add support for specialization of models through subtypes with … by @swaroopjagadish in #3338
- feat(search): Redesign search page - left filter pane by @dexter-mh-lee in #3337
- feat(users & groups): User & Groups Management GraphQL APIs + UI by @jjoyce0510 in #3318
- fix(pk + autocomplete): some ui fixes by @gabe-lyons in #3347
- fix(urns): prevent corrupted urns from being created by @gabe-lyons in #3348
- fix(ingestion-docker): Codegen and build again by @dexter-mh-lee in #3342
- docs(ingest): fix trino doc by @mayurinehate in #3339
- fix(docker-quickstart): Fix volume mount paths when using quickstart by @dexter-mh-lee in #3341
- fix(autocomplete): Fix empty autocomplete server error by @jjoyce0510 in #3346
- fix(Add custom elastic field mappings for all timeseries fields) by @rslanka in #3350
- fix(gitignore): Fix gitignore to ignore whole directory by @dexter-mh-lee in #3361
- fix(mce_builder): deleted alias by @vlavorini in #3356
- feat(data-platform): Add science and airflow data platform by @dexter-mh-lee in #3363
- fix(ui): fix url encoding issues by @gabe-lyons in #3359
- fix(gitignore): Update gitignore again - remove metadata-ingestion objects by @dexter-mh-lee in #3365
- fix(ci): add run_id to the task instance constructor for airflow by @swaroopjagadish in #3366
- fix(aws-deploy-docs): Fix documentation for elasticsearch by @dexter-mh-lee in #3360
- fix(bigquery_usage): Gracefully failing while parsing GCP log events. by @varunbharill in #3367
- feat(ingest): allow disabling sample values in profiling by @aseembansal-gogo in #3355
- fix(docs): fix docs for developing on metadata ingestion by @aseembansal-gogo in #3353
- test(CI): Timeout build job by @EnricoMi in #3364
- docs(OIDC): add note that root user is still accessible by @aseembansal-gogo in #3372
- test(metadata-io): Run metadata-io tests in parallel by @EnricoMi in #3358
- test(ElasticSearch): Retry ES requests by @EnricoMi in #3377
- fix(ingest): redshift usage properly count queries by @treff7es in #3370
- feat(subtypes): Support Viz for "view" subtypes by @jjoyce0510 in #3376
- fix(graphql): Correctly return tags and legacy global tags field by @jjoyce0510 in #3378
- fix(ingest): fixing support for kafka key schemas when only key schemas are present by @swaroopjagadish in #3379
- fix(search): Small bug fixes for search redesign by @dexter-mh-lee in #3381
- test(airflow): remove unneeded execution_date parameter from test by @hsheth2 in #3368
- feat(ingest): add mariadb as possible source by @aseembansal-gogo in #3245
- fix(search): fixing user and group links in search results by @gabe-lyons in #3383
- fix(subtypes): Fix subtypes tab visibility by @jjoyce0510 in #3386
- Revert "test(ElasticSearch): Retry ES requests" by @gabe-lyons in #3385
- Revert "Revert "test(ElasticSearch): Retry ES requests"" by @gabe-lyons in #3392
- Adding kafka connect data platform by @jjoyce0510 in #3388
- Replace big query logo with the latest by @jjoyce0510 in #3387
- oidc: Add "name" claim extraction if present by @jjoyce0510 in #3384
- feat(ingest): teaching lookml source that athena has 2 parts in its dataset names by @swaroopjagadish in #3393
- fix(ingest): fix issues with lookml view file resolution on non-view … by @swaroopjagadish in #3397
- feat(search): Search insights foundations by @jjoyce0510 in #3391
- fix(graphQL): Populating deprecated Dataset description field by @jjoyce0510 in #3403
- feat(search): Support Boolean OR Filters in Rest.li APIs by @jjoyce0510 in #3344
- fix(lookml): Fixing lookml integration test. by @varunbharill in #3405
- fix(browse): Add more special character handling by @dexter-mh-lee in #3404
- fix(search): Reduce default batch size by @dexter-mh-lee in #3407
- fix(ui): Extract customProperties map from "properties" OR "info" entity field by @jjoyce0510 in #3410
- fix(gms): Add Rest.li Validation to ingestProposal by @jjoyce0510 in #3409
- fix(ingest): set athena dataset name with 2 parts in redash source by @Rukesh-Kapuluru in #3406
- feat(bigquery): Ingest lineage metadata from Bigquery logs. by @varunbharill in #3389
- docs(ingest): Add required permissions to Azure AD source doc by @jjoyce0510 in #3414
- fix(ingest): switch to avro from deprecated avro-python3 by @hsheth2 in #3412
- feat(spark): add spark logo and dataplatform. by @varunbharill in #3417
- fix(graph service): fix case where certain mcps can incorrectly delete graph edges by @gabe-lyons in #3418
- fix(datahub-upgrade): Update datahub upgrade to use MCL instead of MAE by @dexter-mh-lee in #3411
- feat(ingest): add complex types support in hive and trino source by @mayurinehate in #3375
- fix(docs): Add disk usage req to quickstart doc by @dexter-mh-lee in #3415
- test(modelValidation): Enhance Error Message by @RyanHolstien in #3394
- feat(metadata-service): Introducing EntityClient interface to avoid unnecessary HTTP calls. by @jjoyce0510 in #3421
- fix(deletes): make sure deletion removes lineage by @gabe-lyons in #3423
- feat(react): dynamically hide entity types that haven't been ingested by @gabe-lyons in #3419
- feat(ingest): support profiling tables in parallel by @hsheth2 in #3369
- fix(ingest): allow database alias, remove extra removal from connect_… by @aseembansal-gogo in #3352
- fix(bigquery): Fix error when computing lineage in bigquery is turned off by @varunbharill in #3428
- fix(oidc): Fix the oidc lastModifiedAt bug by @jjoyce0510 in #3429
- fix(dupe edges): Fix datajob duplicate edges in elastic by @jjoyce0510 in #3426
- fix(ingest): resolve click-default-group deprecation warning by @hsheth2 in #3427
- fix(browse): fix browse for entities without default browse logic by @gabe-lyons in #3422
- feat(ingest): add parallelism to looker source and datahub rest sink by @swaroopjagadish in #3431
- docs: Peloton adoption of Datahub by @arunvasudevan in #3433
- Docs branding by @jeffmerrick in #3432
New Contributors
- @xdl made their first contribution in #3285
- @thebouv made their first contribution in #3314
- @varunbharill made their first contribution in #3321
- @serefacet made their first contribution in #3317
- @zhoxie-cisco made their first contribution in #3297
- @AndreasTA-AW made their first contribution in #3270
- @mayurinehate made their first contribution in #3307
- @treff7es made their first contribution in #3370
- @Rukesh-Kapuluru made their first contribution in #3406
- @jeffmerrick made their first contribution in #3432
Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.15...v0.8.16
DataHub v0.8.15
Released on Wed Sep 29 2021 by @shirshanka.
Notable Changes
- Support the “NONE” Client Authentication Method for OIDC login.
- Migrated to the new UI for Charts, Dashboards, Data Flows (Pipelines), Data Jobs (Tasks) profile pages
- Primary and Foreign Keys rendered in the UI
- Ingestion
- Support for
redshift-usage
source - Fixes for
looker
ingestion datahub
cli supports -f/--force option to skip confirmations
- Support for
Changelog
- #3310 @jjoyce0510 Updating logo
- #3309 @jjoyce0510 Fixing lineage
- #3308 @jjoyce0510 Attach Client ID to token request in Authentication Mode none
- #3256 @aseembansal-gogo feat(ingest): add -f option to skip confirmations for automation en…
- #3298 @gabe-lyons feat(react): show primary keys & foreign keys in the schema
- #3172 @gabe-lyons marking data process aspects as deprecated
- #3301 @jjoyce0510 fix(upgrade): Improving NoCodeUpgrade logic to account for Bootstrap logic
- #3305 @jjoyce0510 feat(oidc): Support NONE client auth method in OIDC (stopgap)
- #3304 @gabe-lyons fix(docs): fix entity doc link
- #3303 @jjoyce0510 feat(UI): UI Migration for Charts, Dashboards, Pipelines, Tasks + Glossary Terms and Links for all.
- #3276 @bboylen feat(react): add groups tab to user profile
- #3299 @swaroopjagadish feat(build): adding support for python codegen for all aspects, not just the snapshot ones
- #3294 @swaroopjagadish fix(ingest): looker explores with joins, parsing failures on lateral flatten
- #3277 @chinmay-bhat feat(ingest): add redshift usage source
- #3290 @adriaanslechten feat(ingest): optional custom headers REST emitter
- #3293 @chinmay-bhat fix(build): update tox.ini to allow new dependencies to be installed
- #3292 @gabe-lyons fix(ingest): update generated files
- #3278 @jjoyce0510 refactor(graphql): GraphQL Public API Refactor + Documentation
- #3287 @swaroopjagadish fix(ingest): fix typo in looker tag generation
- #3275 @gabe-lyons feat(foreign keys): add foreign key models
- #3283 @aseembansal-gogo feat(ingest): add athena logo
- #3280 @gabe-lyons fix(react): fix updates from the UI
- #3279 @swaroopjagadish feat(ingest): add nice semantic run-ids that use source type and time of ingestion
- #3274 @gabe-lyons fix(chartinfo): only map chartinfo inputs if exists
- #3272 @gabe-lyons docs(adoption): updating adoption logos
- #3271 @jjoyce0510 fix(policies): Always ingest non-editable policies on boot
- #3259 @gabe-lyons feat(graphql): Adding write side validation and tests for add+remove API
- #3264 @swaroopjagadish fix(ingest): making lookml recursive and nested includes work
- #3262 @swaroopjagadish fix(ingest): looker cascading derived tables should express lineage to view not underlying table
- #3254 @abdvl fix(web): upgrade remove-markdown package to fix a ReDoS security issue
- #3011 @EnricoMi test(GraphService): Thorough graph service tests
- #3258 @jensenity chore: add banksalad to datahub adoption readme
DataHub v0.8.14
Released on Fri Sep 17 2021 by @shirshanka.
Release Highlights
- Small bug fixes over 0.8.13
Notable Changes
- Fix bug in OIDC config for setting response type
- Add WAU chart in the analytics page
- Starting with
acryl_datahub==0.8.13.1
(pypi), Looker and Lookml ingestion will now name views differently from before. You will need to delete old LookML metadata to start with a clean slate or specifyview_naming_pattern = “{name}”
in both your Looker and LookML ingestion recipes to get the old behavior. - Populate the user email field in usage statistics to correctly show top users on the entity page
- Full changelog below
Changelog
- #3215 @aseembansal-gogo feat(ingest): support for env variable in cli
- #3253 @remisalmon fix(ingest): allow ingestion of glossary terms without nodes
- #3255 @swaroopjagadish feat(ingest): looker and lookml improvements - connection, explores, folders
- #3010 @EnricoMi refactor(dao/utils): Move general createRelationshipFilter from Neo4jUtil to QueryUtils
- #2736 @jjoyce0510 rfc(RBAC): Fine-Grained Access Controls in GMS
- #3251 @jjoyce0510 Fixing response type bug
- #3249 @dexter-mh-lee Fix OIDC doc
- #3252 @dexter-mh-lee feat(analytics): Add WAU over the last 2 months chart
- #3250 @gabe-lyons feat(glossary): splitting apart tags & terms into their own visual sections
- #3244 @rslanka fix(usage statistics): populate the email field
- #3238 @aseembansal-gogo fix(ingest): add missing partition keys in schema for glue sources
- #3243 @swaroopjagadish fix(ingest): fixing snowflake and bigquery usage connectors to use real user urns
- #3241 @claudio-benfatto fix(docker): use wait-http-header to avoid printing cleartext credentials
- #3220 @dexter-mh-lee fix(frontend): Add additional sasl config for kafka producer in datahub-frontend
DataHub v0.8.13
Released on Wed Sep 15 2021 by @shirshanka.
Release Highlights
- Support for aggregated statistics wrt the timeseries aspect. Moved usage stats functionality to use the new framework.
- Auto-ingest common data platforms on GMS boot! No more generic logos.
- Fixes re-ingestion of modified policies at startup
- Full changelog below
Breaking Changes
- Usage stats endpoint now uses the time-series aspect index in Elastic, meaning that statistics ingested previously will be lost. Please re-run usage ingestion (e.g. bigquery-usage / snowflake-usage) etc. to backfill your usage statistics history.
Changelog
- #3242 @rslanka fix(dataset profiles): compatibility with older indices.
- #3235 @gabe-lyons fix(glossary terms): some cosmetic fixes for glossary terms
- #3234 @jjoyce0510 fix(policies): Only ingest bootstrap policies on clean start
- #3207 @rslanka feat(Analytics): Support for Timeseries Aggregated Statistics
- #3160 @EnricoMi test(metadata-io): Improve speed of ElasticSearch tests
- #3232 @gabe-lyons fix(glossary terms): add glossary terms privilege to COMMON_ENTITY_PRIVILEGES
- #3230 @gabe-lyons fix(react): fix jitter on schema when adding description
- #3219 @chinmay-bhat feat(ingest): auto-ingest common data platforms on GMS boot
- #3229 @jjoyce0510 fix(upgrade): Check whether tables exist using findList
- #3213 @gabe-lyons feat(business glossary): Add support to add & remove glossary terms from the UI
- #3221 @dexter-mh-lee fix(oidc): add more oidc config
- #3223 @gabe-lyons fix(graphql): fix tag mapper
- #3224 @gabe-lyons fix(react): fix lineage button highlighting
- #3222 @gabe-lyons fix(react): add owner modal title
- #3216 @gabe-lyons fix(react): fix proxy for login route
DataHub v0.8.12
Released on Thu Sep 09 2021 by @shirshanka.
Release Highlights
- RBAC Phase 1: Added abilities to control access through policies in the UI and backend
- Dataset page refresh!!! + improved home page, search and browse screens
- Added the ability to monitor DataHub through Prometheus and provided example Grafana dashboards
- GraphQL API browser hosted on /api/graphql endpoint.
- Support for Business Glossary ingestion through yml file
- Support for Azure AD ingestion source
Notable Changes
- Fixed unicode rendering bug introduced in v0.8.11
- Added the ability to search by properties in the customProperties bag: supports case-insensitive matches of the form ‘key=value’
- For instance, query “encoding=utf-8” will return entities with “encoding”: “utf-8” in the property bag
- Full changelog below
Changelog
- #3214 @dexter-mh-lee fix(docker): pin setuptools version in docker ingestion build
- #3212 @gabe-lyons fix(metadata-ingestion): fixing lint issues
- #3196 @abdvl fix(react): safely access caught Error properties
- #3195 @dexter-mh-lee feat(perf): Add perf testing and monitoring framework
- #3136 @dexter-mh-lee feat(search): Add searchable annotation to maps
- #3158 @karoliskascenas feat(ingest): optionally ingest deleted looker dashboards
- #3210 @gabe-lyons fix(admin): moving admin links to header
- #3211 @dexter-mh-lee fix(build): specify setuptools version for dev install
- #3208 @dexter-mh-lee fix(search): Move filters to query instead of post query
- #3209 @gabe-lyons fix(react): fix tag schema search on tag profile
- #3190 @jjoyce0510 fix(graphql): fix ml model properties resolver
- #3200 @jjoyce0510 fix(bootstrap): making bootstrap manager run once
- #3197 @jjoyce0510 feat(access control): Adding "authorizedActors" method to AuthorizationManager
- #3201 @EnricoMi ci: upload test reports
- #3199 @jjoyce0510 Fix GraphQL Variables
- #3193 @abdvl refactor(test): remove the
datahub-frontend.graphql
- #3198 @dexter-mh-lee fix(platform): fix kafka env name for MCL_timeseries
- #3194 @jjoyce0510 fix(react): fix add links
- #3192 @gabe-lyons fix(react): fixing format of search snippets
- #3191 @jjoyce0510 fix(react): pin the control center menu icon
- #3189 @jjoyce0510 fix(404): Fix 404 Exit Error.
- #3182 @jjoyce0510 feat(access control): Fine-Grained Access Control M1
- #3187 @gabe-lyons fix(react): Fix the fieldPath grouping logic in the front-end
- #3188 @nickwu241 docs: fix "data platforms" link in dbt.md
- #3184 @dexter-mh-lee fix(kafka): Change env variable name for MCL_versioned to be consistent
- #3185 @gabe-lyons fix(react): removing preview artifact from platform logo
- #3183 @chinmay-bhat fix(business_glossary): added init.py
- #3181 @chinmay-bhat refactor(ingest): rename azure source to azure_ad
- #3159 @sgomezvillamor feat(ingest): add optional config for ownership type in ownership transformers
- #3179 @remisalmon fix(dbt): use_identifiers option and avoid duplicate descriptions
- #3164 @shirshanka feat(ingest): Add a business glossary source
- #3178 @gabe-lyons fix(react): show schema-attached description
- #3177 @dexter-mh-lee Revert "fix(search): move filters to query instead of postFilter (#3112)"
- #3173 @dexter-mh-lee fix(docs): Add documentation for AWS MSK
- #3176 @dexter-mh-lee feat(airflow): add example docker setup for airflow
- #3175 @gabe-lyons fix(dataflow): optimize topological sort logic
- #3170 @chinmay-bhat docs(ingestion): updated hive ingestion docs with Databricks recipe
- #3171 @chinmay-bhat fix(doc): add use_odbc to mssql doc example
- #3169 @gabe-lyons feat(react): Dataset page refresh + improved homepage, search and browse screens
- #3168 @gabe-lyons fix(frontend): fix utf8 encoding bug
- #3167 @shirshanka docs: update Aug townhall details and announce Sep townhall
- #3112 @dexter-mh-lee fix(search): move filters to query instead of postFilter
- #3148 @frsann feat(ingest): Minor Kafka Connect source improvements
- #3161 @chinmay-bhat feat(ingest): Adding Azure Source integration to ingest users, groups and group memberships
- #3165 @jjoyce0510 feat(graphql): add GraphQL Explorer (GraphiQL)