Skip to main content

DataHub Releases

Summary

VersionRelease DateLinks
v0.8.39Fri Jun 24 2022Release Notes, View on GitHub
v0.8.38Thu Jun 09 2022Release Notes, View on GitHub
v0.8.37Thu Jun 09 2022Release Notes, View on GitHub
v0.8.36Thu Jun 02 2022Release Notes, View on GitHub
v0.8.35Wed May 18 2022Release Notes, View on GitHub
v0.8.34Wed May 04 2022Release Notes, View on GitHub
v0.8.33Fri Apr 15 2022Release Notes, View on GitHub
v0.8.32Mon Apr 04 2022Release Notes, View on GitHub
v0.8.31Thu Mar 17 2022Release Notes, View on GitHub
v0.8.30Thu Mar 17 2022Release Notes, View on GitHub
v0.8.29Thu Mar 10 2022Release Notes, View on GitHub
v0.8.28Mon Mar 07 2022Release Notes, View on GitHub
v0.8.28rc1Sat Mar 05 2022Release Notes, View on GitHub
RC-v0.8.28Sat Mar 05 2022Release Notes, View on GitHub
v0.8.27Wed Feb 23 2022Release Notes, View on GitHub
v0.8.26Tue Feb 08 2022Release Notes, View on GitHub
v0.8.25Mon Feb 07 2022Release Notes, View on GitHub
v0.8.24Mon Jan 24 2022Release Notes, View on GitHub
v0.8.23Fri Jan 14 2022Release Notes, View on GitHub
v0.8.22Sat Jan 08 2022Release Notes, View on GitHub
v0.8.21Tue Dec 28 2021Release Notes, View on GitHub
v0.8.20Mon Dec 20 2021Release Notes, View on GitHub
v0.8.19Mon Dec 13 2021Release Notes, View on GitHub
v0.8.18Fri Dec 10 2021Release Notes, View on GitHub
v0.8.17Fri Nov 19 2021Release Notes, View on GitHub
v0.8.16Thu Oct 21 2021Release Notes, View on GitHub
v0.8.15Wed Sep 29 2021Release Notes, View on GitHub
v0.8.14Fri Sep 17 2021Release Notes, View on GitHub
v0.8.13Wed Sep 15 2021Release Notes, View on GitHub
v0.8.12Thu Sep 09 2021Release Notes, View on GitHub

v0.8.39

Released on Fri Jun 24 2022 by @maggiehays.

Highlights

User Experience

  • NEW: support for surfacing outcomes of dbt Tests in dataset entity pages (see it in action here)
  • NEW: Improved navigation of dbt resources: dbt models and their associated warehouse tables are now merged into a unified entity (see it here). This will automatically be enabled for all newly ingested entities. To view this for entities you have already ingested, you will need to run a restore indices job.
  • Improvement to Impact Analysis: When looking at the Lineage tab, you can now easily toggle between “Upstream” and “Downstream” entities (try it out here)

Developer Experience

  • NEW: Java Kafka Emitter – Use this when you want to decouple your metadata producer from the uptime of your datahub metadata server by utilizing Kafka as a highly available message bus

Metadata Ingestion

  • NEW: Make bulk edits to your metadata via CSV (read more)
  • Snowflake ingestion improvements: configure profiling to run only if they have been updated within the prior N days
  • Managed ingestion update: removed need for sink block

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.38...v0.8.39

[!] DataHub v0.8.38

Released on Thu Jun 09 2022 by @jjoyce0510.

Notice: There is a known issue in this release. Listing access tokens for a user may not return the correct results to the UI due to an unreliable query to DataHub's search backend. This will be resolved in v0.8.39. Note that this does not mean that access tokens will not work or are in any way compromised - the functionality of generating and using access tokens is not impacted.

The below release notes are copied from v0.8.37 release notes.

Highlights

User Experience

This release comes packed full of new features and updates.

  • NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
  • NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
  • NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
  • UPDATE - Rename “Manage” navigation item to “Govern”
  • [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
  • [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
  • FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
  • Minor fixes & improvements to UI for adding policy users + groups.

Metadata Ingestion

  • Support Snowflake ingest via Oauth
  • Misc fixes and improvements to existing ingestion sources

Disclaimers:

With this upgrade, we've added a new mechanism for authenticating users: native authentication. By default, this is enabled, which will allow new users to be created by Admin and for the user to login.

If you were previously disabling BOTH JaaS (via AUTH_JAAS_ENABLED = false) AND OIDC, and you still do not want to require a username + password to login, you'll need to add a new environment variable to datahub-frontend-react container: AUTH_NATIVE_ENABLED=false.

What's Changed

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.37...v0.8.38

[!] DataHub v0.8.37

Released on Thu Jun 09 2022 by @jjoyce0510.

Notice! This version has a few known bugs regarding revocable access tokens. Specifically, the UI for listing access tokens does not work properly unless you have a specific platform privilege. Additionally, there is a delay in revoking access tokens of 6 hours. We recommend that you skip this version and upgrade directly to v0.8.38.

Highlights

User Experience

This release comes packed full of new features and updates.

  • NEW – Create & Revoke Access Tokens via the UI - Find this under Settings > Developer. This replaces the previous stateless tokens UI.
  • NEW – Create and Invite Users to DataHub via the UI - Find this under Users & Groups > Invite DataHub users. Admins can also now generate password reset links for their users.
  • NEW - Manage Related Glossary Terms via the UI - Add and remove Glossary Terms Contained By and Inherited From a parent via the UI. Find this under Glossary
  • UPDATE - Rename “Manage” navigation item to “Govern”
  • [IMPORTANT] UPDATE - Move “Users & Groups” navigation item into Settings > Access
  • [IMPORTANT] UPDATE - Move “Policies” navigation item into Settings > Access (Privileges)
  • FIX - You no longer need to run a reindexing job to start using the new Business Glossary UI. This process is handled for you at boot time.
  • Minor fixes & improvements to UI for adding policy users + groups.

Metadata Ingestion

  • Support Snowflake ingest via Oauth
  • Misc fixes and improvements to existing ingestion sources

What's Changed

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.36...v0.8.37

DataHub V0.8.36

Released on Thu Jun 02 2022 by @treff7es.

V0.8.36

Highlights

User Experience

NEWManage Glossary Terms via the DataHub UI! Delivering on our Q2’22 Roadmap item, end users can now create, edit, move, delete, and deprecate Glossary Terms via the UI! With this new experience comes some new ways of indexing data in order to make viewing and traversing the different levels of your Glossary possible. Therefore, you will have to restore your indices in order for the new Glossary experience to work for users that already have existing Glossaries. If this is your first time using DataHub Glossaries, you're all set!

Ability to add multiple Owners, Tags, Terms

Developer Experience

The new Revokable Token API supports a new type of Access Token which can be revoked & queried, allowing admins to easily delete tokens for operational & security reasons. Read all about it in the Access Token Management Usage Guide.

Ingestion Updates

This release includes 3 new Metadata Sources:

  • Iceberg
  • Vertica
  • SAP HANA

📣 Massive shoutout to DataHub Community members @cccs-eric, @eburairu, and @buggythepirate for driving these contributions! 📣

These sources are currently marked as “Testing” - we encourage you to try them out & provide feedback in the DataHub #ingestion Slack channel!

We’ve rolled out the following ingestion-related improvements:

  • AWS Glue - data profiling is now supported
  • S3 ingestion speed-up
  • Various bug fixes

Full Commit Log

[!] DataHub v0.8.35

Released on Wed May 18 2022 by @dexter-mh-lee.

Notice: Deploying this release will result in an incorrectly named aspect entry existing in the database. The impact is that some upgrade jobs may fail to perform full scans of the database. This will be fixed by upgrading to > v0.8.38 OR by pulling the latest DataHub Upgrade docker image and executing the following upgrade: ./datahub-upgrade.sh -u RemoveUnknownAspects

v0.8.35

Highlights

Reduced vulnerability counts in project Various bug fixes New streamlined docker workflow

Full Commit Log

v0.8.34

Released on Wed May 04 2022 by @maggiehays.

Release Highlights

Developer Experience

  • DataHub Actions Framework is LIVE! The Actions Framework makes responding to real-time changes in your Metadata Graph easy, enabling you to seamlessly integrate DataHub into a broader events-based architecture. Check out the repo here
  • This release also introduces OpenAPI endpoints to post, get, and delete entities. Check out the usage guide here
  • Metadata Ingestion Source docs have a new look! We now have code-generated documentation to apply consistency in format and contents

User Experience

  • New! The Dataset Schema page now supports a “Blame View” to quickly understand how a field has evolved over semantic schema versions. You can find more info about how we compute versions here​​.

Ingestion Improvements

  • New! Now incubating the Apache Pulsar source
  • Update to Feast connector to support v0.18
  • Ongoing improvements to Snowflake external table support
  • Improvements to handling BigQuery audit log SQL queries
  • Miscellaneous Tableau fixes for lineage, browse path, non-embedded datasets

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.33...v0.8.34

DataHub v0.8.33

Released on Fri Apr 15 2022 by @dexter-mh-lee.

Release Highlights

User Experience

Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality

Ingestion Improvements

  • Airflow Improvements - as demoed in March Town Hall
    • Add support to capture Airflow execution runs from lineage backend
    • Introduce new High level API for generating dataflow/job/dataprocessinstance
  • MS SQL ingestion now captures table & column descriptions
  • Trino platform support for Great Expectations
  • New Presto-on-Hive ingestion source
  • BigQuery ingestion now supports extraction of usage info from audit logs
  • Fix to Looker ingestion to extract Explore Views from join names
  • Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
  • Simplify & annotate Redshift Usage source

Full Commit Log

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.32...v0.8.33

DataHub v0.8.32

Released on Mon Apr 04 2022 by @dexter-mh-lee.

Release Highlights

User Experience

We're excited to announce View-based RBAC Policies! You can now create and apply view-only permissions to your DataHub end-users, providing more robust access controls.

We've also included some small (but impactful!) improvements to UX, including:

  • Display recent search terms when beginning the search flow
  • Consistently displaying entity subtypes for dbt, Looker, Kafka, & more. Think: Kafka entities are displayed as "topics" instead of "datasets"

Ingestion Highlights

  • New! Protobuf ingestion (shoutout to @leifker for this Community-led contribution!)
  • Initial work to support a "Notebook" entity (shoutout to @tc350981 for spearheading this work!!)
  • Stateful ingestion for dbt is now supported
  • Ongoing improvements to our Tableau ingestion source from @nandacamargo & @cuong-pham
  • Improvements to handling database aliases for Redshift ingestion
  • Improvements to S3 source:
    • Add containers for datasets
    • Support platform_instance
    • Support for folder level datasets
    • Increased flexibility to specify dataset paths
  • Ingestion Fixes:
    • Snowflake Usage - log warning instead of error out & other error handling
    • Snowflake allow/deny patterns
    • Examples of allow/deny patterns added to docs

Full Commit Log

DataHub v0.8.31

Released on Thu Mar 17 2022 by @dexter-mh-lee.

Bugfix release to prevent failing reindexing of system metadata index in elasticsearch

Full Commit Log

  • #4440 @pedro93 fix(cli) Makes filtered search deletes include BOTH removed and non-removed
  • #4444 @pedro93 fix(cli) Adds elasticsearch mapping
  • #4432 @leifker feat(protobuf): Gradle protobuf example project

Datahub v0.8.30

Released on Thu Mar 17 2022 by @rslanka.

V0.8.30

Release Highlights

  • Fix for OIDC encryption bug from v0.8.29
  • Adds platform instance id to the container id generation, and support for migrating the old container ids to the new ones via the datahub migrate CLI.

Notable UI-Based Features

  • Showing recent searches in autocomplete.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.8.29...v0.8.30

DataHub v0.8.29

Released on Thu Mar 10 2022 by @shirshanka.

v0.8.29

NOTICE

This version is affected by an OIDC (SSO) related issue with the following stack trace:

datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend Caused by: java.security.InvalidKeyException: Invalid AES key length: 30 bytes
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend at com.sun.crypto.provider.AESCrypt.init(AESCrypt.java:87)

DataHub core team is working to address this. For now, we recommend staying on 0.8.28 if you are using OIDC actively!

Release Highlights

Fix for MAE & MCE consumer healthcheck Upgrade to Java 11 and Gradle 6

Full Commit Log

DataHub v0.8.28

Released on Mon Mar 07 2022 by @shirshanka.

Release Highlights

Notable UI-Based Features

Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.

View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.

User Groups, Policies, and Tags have a new look!

  • The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
  • We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
  • The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).
Notable Metadata Model & Ingestion-Based Features

First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.

Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.

Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.

Miscellaneous Metadata Ingestion Updates:

  • Incubating: PowerBI Ingestion Source
  • BigQuery Profiling: ability to disable profiling by partition
  • Tableau improvements: Workbooks are now modeled as “Containers”

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...v0.8.28

DataHub Release Candidate v0.8.28 (rc1)

Released on Sat Mar 05 2022 by @shirshanka.

DataHub v0.8.28 Release Candidate 1

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...v0.8.28rc1

Release Candidate v0.8.28

Released on Sat Mar 05 2022 by @shirshanka.

Release Candidate for Version 0.8.28.

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.27...RC-v0.8.28

DataHub v0.8.27

Released on Wed Feb 23 2022 by @shirshanka.

Release Highlights

Notable UI-Based Features

  • The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.

  • Search for Entities by Owner - Easily filter search results by User/Group Owner

  • Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!

  • Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!

Notable Metadata Model & Ingestion-Based Features

  • ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!

  • Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!

  • Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!

  • BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.

Notable Docs Updates

  • NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl

  • Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.26...v0.8.27

DataHub v0.8.26

Released on Tue Feb 08 2022 by @shirshanka.

This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.

Release Highlights

  • Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.

DataHub v0.8.25

Released on Mon Feb 07 2022 by @shirshanka.

Known Issues

  • Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.

Release Highlights

Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.

Notable UI-Based Features

  • UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
  • Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
  • Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.

Notable Metadata Model & Ingestion-Based Features

  • Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
  • Avro files are now supported in the Data Lake File ingestion source
  • Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the datahub migrate command to migrate them over to platform instances.
  • Ignore users from Top Users calculation
    • feat(ingestion): Adding ability to ignore users from top users calculation by @treff7es in #3735
  • BigQuery - Data Profiling on only the latest partition/shard
    • feat(ingestion) bigquery: Profiling only the latest partition/shard on bigquery by @treff7es in #3930
  • (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813

Notable Fixes

  • Fix to support View in Looker * feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
  • fix(graphql): support group display name in ownership by @thomasplarsson in #3979
  • fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
  • fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926

DataHub Usage Guides

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.24...v0.8.25

DataHub v0.8.24

Released on Mon Jan 24 2022 by @shirshanka.

Release Highlights

  • Adding support for nested Glue schemas
  • Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
  • Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
  • Miscellaneous bug fixes & improvements

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.23...v0.8.24

DataHub v0.8.23

Released on Fri Jan 14 2022 by @shirshanka.

Release Highlights

  • Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
  • Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
  • Robustness improvements to DataHub Java Client Package
  • Introducing a new Elasticsearch ingestion connector!
  • Misc bug fixes & improvements.

What's Changed

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.22...v0.8.23

DataHub v0.8.22

Released on Sat Jan 08 2022 by @shirshanka.

Disclaimers!

  • Ingesting Chart Inputs was broken in a PR that got into this release. This will be fixed in v0.8.23. If you plan to ingest Charts / Dashboards, we recommend skipping this version and upgrading to v0.8.23 directly.

Release Highlights:

  • Support for mapping DBT meta properties of a dataset to metadata operations, such as add_owner, add_term, add_tag etc.
  • Java REST emitter library to programmatically generate metadata events from Java-based clients such as from Spark jobs.
  • Data freshness indication via Last Updated Timestamp.
  • Improvements to data profiling performance and lineage extraction

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.21...v0.8.22

v0.8.21

Released on Tue Dec 28 2021 by @shirshanka.

This release includes a fix for timeouts in reindexing of large indices that occurs when new fields are added to an index.

Release Highlights

  • Getting Started Modal + Empty State: Improve the experience of having no data ingested in DataHub by providing a "Getting Started" Guide when there is no data yet ingested.
  • Provide BigQuery credentials via recipe config: Previously BigQuery credentials were provided via environment variable. Going forward they can be provided directly inside the Recipe config.
  • Increase re-indexing 30s timeout: Previously elastic reindexing was maxed at a 30 second synchronous timeout. This was causing some upgrades of GMS to fail. This PR increases that timeout to one hour.

What's Changed

  • fix(lkml): bump lkml version up to 1.1.2 to support sql_preamble expression by @hyunminch in #3757
  • fix(react-ui): fix header min height by @gabe-lyons in #3784
  • docs(auth): add Microsoft Azure as an SSO provider (#3779) by @cccs-eric in #3780
  • Add azure OIDC doc to sidebar by @jjoyce0510 in #3785
  • feat(UI): Add "Getting Started" Modal on fresh deployment by @jjoyce0510 in #3773
  • feat(transform): adds simple add dataset properties transform by @sgomezvillamor in #3778
  • Update troubleshooting steps for local development with docker by @RyanHolstien in #3788
  • docs(redshift): Updating Redshift permission prerequisites in doc by @treff7es in #3777
  • fix(superset): fix Superset chart ingestion with an empty metric label by @cccs-eric in #3793
  • doc(transforms): adds doc for simple_add_dataset_properties transformer by @sgomezvillamor in #3790
  • feat(ingest): Add config option to set Bigquery credential in source config by @treff7es in #3786
  • fix(elastic): allow more time for re-indexing tasks by @gabe-lyons in #3794
  • docs(kafka): add example for ingestion from confluent cloud by @anshbansal in #3789

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.20...v0.8.21

v0.8.20

Released on Mon Dec 20 2021 by @shirshanka.

This release includes the patch for CVE-2021-44228, pinning log4j to 0.2.17. Small bug fixes & improvements, otherwise.

Release Highlights

  • Configurable aspect retention in application.yml (disabled by default)
  • Metabase Ingestion Source connector
  • Constrain log4j to version 0.2.17
  • Upgrade logback to 1.2.9

What's Changed

  • feat(spark-lineage): add ability to push data lineage from spark to d… by @MugdhaHardikar-GSLab in #3664
  • feat(cli): allow to nuke without deleting data in quickstart by @anshbansal in #3655
  • feat(Dgraph): Make Dgraph a proper Neo4j alternative by @EnricoMi in #3578
  • feat(retention): Add retention to Local DB by @dexter-mh-lee in #3715
  • feat(ingest): cleanup deprecated datahub.integrations.airflow.* imports by @hsheth2 in #3732
  • feat(ingestion) : Add Metabase Source Connector by @jawadqu in #3602
  • fix(ingest): count profiled tables separately in report by @hsheth2 in #3731
  • feat(perf-test): changes for perf testing by @anshbansal in #3728
  • ci(cypress): adding the foundation for cypress integration tests & some starter coverage for login, search & updates by @gabe-lyons in #3672
  • (fix) Elastic search container log4j CVE-2021-44228 vulnerability by @nsbala-tw in #3733
  • Revert "feat(Dgraph): Make Dgraph a proper Neo4j alternative" by @gabe-lyons in #3740
  • fix(CI): Regenerate Docker Quickstart by @jjoyce0510 in #3741
  • fix(DataHubGraph): changing datahub-graph to use underlying session connection. by @varunbharill in #3743
  • fix(ingest): Remove unecessary isalpha check for data platforms + warnings by @jjoyce0510 in #3742
  • feat(snowflake-usage): add knob for direct objects accesssed vs base objects accessed by @gabe-lyons in #3744
  • fix(snowflake): support snowflake allow/deny pattern for lineage and usage by @varunbharill in #3748
  • refactor(gms auth): Remove base64 decoding of token service signing key by @jjoyce0510 in #3747
  • test(ingest): fix pytest warning for class starting with Test by @hsheth2 in #3745
  • feat: enables dbt metadata files to be loaded from URIs by @sgomezvillamor in #3739
  • fix(ingestion): Skipping duplicate tables from ingestion by @treff7es in #3753
  • feat(Stateful Ingestion): 1/3 Stateful ingestion server changes by @rslanka in #3749
  • Fix CVE-2021-44228 continued: log4j constraints to version 2.16.0 by @jjoyce0510 in #3755
  • build(ingest): restrict latest mypy version by @hsheth2 in #3756
  • doc: Add IOMED as a DataHub adopter by @merqurio in #3758
  • docs(spark-lineage): update artifact name and version by @MugdhaHardikar-GSLab in #3760
  • feat(profiler): add upper bound on combined query size by @hsheth2 in #3762
  • feat(ingestion): Mode retry wait logic to avoid hitting Mode API rate limit by @jawadqu in #3761
  • feat(Stateful Ingestion-2/3): Client side changes for checkpointing a source job state. by @rslanka in #3763
  • refactor(test): replace CliRunner with run_datahub_cmd method by @hsheth2 in #3746
  • feat(bigquery): add support for parsing exported bigquery audit logs by @hyunminch in #3680
  • feat(ingest): Adding support for Elasticsearch and Clickhouse by @sudotty in #3227
  • Upgrade to logback 1.2.9 to address CVE-2021-42550 by @jjoyce0510 in #3771
  • fix(profiling): Disabling expensive profilers by default by @treff7es in #3759
  • docs(ingestion): Add details of sensitive info handling by @anshbansal in #3767
  • docs(snowflake): Adding documentation about required Snowflake Privileges by @jjoyce0510 in #3770
  • Upgrade to 3rd Apache patch for log4j by @xiphl in #3772
  • fix(ingestion): Fix for same schema foreign key reference by @treff7es in #3769
  • fix(ingest): fix compatibility with google composer by @anshbansal in #3774

Known Issues

We've been made aware that in large deployments the re-indexing step required at boot-up time exceeds the 30 second timeout. We've since made changes to loosen this timeout limit, with these changes coming in 0.8.21.

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.19...v0.8.20

v0.8.19

Released on Mon Dec 13 2021 by @shirshanka.

This release is a fast followup to the more substantial 0.8.18 release addressing bugs a few folks are facing in the Community.

Release Highlights

  • Fix base64 cli command issue where some systems do not have it.
  • Fix usage user extraction where email domain repeated twice.

What's Changed

  • fix(recommendations): don't show a 0 character when there are no suggestions by @gabe-lyons in #3720
  • fix(mode): support definitions in mode query by @gabe-lyons in #3721
  • fix(doc): fixing doc in datahub cli for corpuser urn. by @varunbharill in #3717
  • docs(redshift): Adding svv_table privilege requirement to redshift source doc by @treff7es in #3708
  • fix(profiler): Fixing division by zero in pct_unique calculation by @treff7es in #3727
  • fix(ingest): get mysql geotypes properly by @treff7es in #3726
  • fix(ingest): update trino source error handling in get_table_comment by @mayurinehate in #3712
  • feat(ingest) Trim long sql queries in usage by @treff7es in #3725
  • fix(ingestion): adds missing port to the connection bootstrap by @sgomezvillamor in #3706
  • fix(ingest): add source.config.connection.schema_registry_config to SchemaRegistryClient creation by @lvicentesanchez in #3702
  • fix(docker): Fix issues with base64 not working on some platforms by @dexter-mh-lee in #3723
  • feat(DataHubGraph): Adding utilities methods to DataHubGraph class. by @varunbharill in #3729
  • fix(superset): handle dashboards without charts (#3713) by @grumbler in #3714

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.18...v0.8.19

v0.8.18

Released on Fri Dec 10 2021 by @shirshanka.

DataHub Release 0.8.18 is here!

Release Highlights

  1. Metadata Service Authentication: Make authenticated requests to the Metadata Service APIs (GraphQL + Rest.li)

    1. Video Demo
    2. Technical Deep Dive
  2. Redshift Lineage: Out-of-the-box support for ingesting Dataset->Dataset lineage from Redshift system tables. Includes Tables, Views, and COPY from S3

    1. Video Demo
  3. Apache Nifi Connector (Beta) : Integration with Apache Nifi to extract DataJobs and DataFlows! Read the source docs here. This source is currently incubating in beta.

  4. Mode Connector (Beta): Integration with Mode Analytics to extract reports, charts, and more! Read the source docs here. This source is currently incubating in beta.

  5. Add Aspects without a fork: This is a major milestone towards No-Code UI

    1. Watch the No Code UI Sneak Peek
  6. Glossary Term Transformer: Allows users to add tags or glossary terms to entities based on a regex match filter (Shoutout to Community Member ecooklin!)

  7. Bug Fixes:

    1. [metadata service] Empty search query fails to resolve
    2. [metadata service][Log4j vulnerability](https://www.lunasec.io/docs/blog/log4j-zero-day/) addressed!! Highly recommend folks to upgrade to latest.
    3. [metadata ingestion][bigquery] Fix handling of partitioned & snapshotted tables for lineage usage, and basic table indexing.
    4. [metadata-service][recommendations] Fix issue where recently viewed and most popular recommendations were not showing up when user urn contains special chars.
    5. [metadata ingestion] Add config to specify ca certificate path for datahub-rest sink
    6. [metadata ingestion][snowflake] Handling for special characters in snowflake databases and schemas.
    7. [ui] Fix Groups page not showing asset ownership correctly
    8. [ui] Fix issue where markdown links were not clickable.
    9. [metadata service] Improve search & recommendations performance by ~50%, homepage load by ~50%.
    10. [cli] Fix deletes by search cannot accept auth token
    11. [metadata service][policies] Fix invalid Tag creation policy
    12. [metadata service][upgrade] Fix Spring injection of Entity Client inside datahub-upgrade

Backwards Incompatible Changes

  • The standalone Spring GraphQL Service has been removed. (Replaced in full by Metadata Service GraphQL API)

New Contributors

What's Changed

v0.8.17

Released on Fri Nov 19 2021 by @shirshanka.

Notable Changes

  • Added Recommendations and redesigned the home page!
    • Modular way to add recommendations throughout the application
    • Recommendation modules for top platforms, recently viewed, popular entities, top tags/terms were added to home page
    • Search page also has top tags/terms module on the bottom
  • Ingestion Sources
    • DBT enhancements
      • Creating dbt platform entities to capture dbt node types such as models, tests, source, seed, etc. linking dbt entities with other dbt or underlying platform entities.
    • OpenAPI specs
    • Kafka Connect (Regex based transformers, BigQuery sink)
    • Trino Usage (Starburst)
  • Improved lineage viz performance and lineage viz UX
    • Improved layout logic
    • Nodes can be dragged and dropped
  • Fixes for delete API not always deleting all of an entities data
  • Improved documentation for adding a custom Metadata Ingestion Source
    • Fixes description rendering for Charts, Dashboards, Flows, Jobs
  • Add YAML configuration file for Metadata Service
  • Filter search results by Sub-Type (Looker Explore, View, etc)
  • Support proxying DataHub Frontend requests to Metadata Service at /api/gms
  • Multi-platform (x86, arm64) support for Docker images (Apple M1 support)
  • Graph Service: DGraph support (phase 1)

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.16...v0.8.17

DataHub v0.8.16

Released on Thu Oct 21 2021 by @shirshanka.

Release Highlights

  • Important bug-fixes: properties for DataJob and DataFlow, descriptions for Datasets should now correctly show in the UI
  • Search redesign! Single search experience across all entity types with left filter bar
  • Added searchAcrossEntities endpoint on both GraphQL and Rest.li that pulls search results for all entity types and mixes them together
  • Dataset level lineages - Added support for ingesting dataset level lineages for bigquery. Added support for linking external tables in redshift to the corresponding table in the external data catalog.
  • Performance optimization: graphql will now directly call the entity service instead of calling the entity resource over http to hydrate graphql models.
  • The “filter” input model used for “search” API now supports disjunctive normal form. (OR of ANDs). The previous filter model should continue to work as expected. (criteria array)
  • Adding foundations (models) for search insights, or highlights shown in the search result previews.
  • Add owner experience improvements: using full text search to find users and groups.
  • User & Group Management Screens!
    • View all users (and those who have logged in)
    • View all groups
    • Create new groups
    • Add and remove group members

Breaking Changes

None

What's Changed

New Contributors

Full Changelog: https://github.com/linkedin/datahub/compare/v0.8.15...v0.8.16

DataHub v0.8.15

Released on Wed Sep 29 2021 by @shirshanka.

Notable Changes

  • Support the “NONE” Client Authentication Method for OIDC login.
  • Migrated to the new UI for Charts, Dashboards, Data Flows (Pipelines), Data Jobs (Tasks) profile pages
  • Primary and Foreign Keys rendered in the UI
  • Ingestion
    • Support for redshift-usage source
    • Fixes for looker ingestion
    • datahub cli supports -f/--force option to skip confirmations

Changelog

DataHub v0.8.14

Released on Fri Sep 17 2021 by @shirshanka.

Release Highlights

  • Small bug fixes over 0.8.13

Notable Changes

  • Fix bug in OIDC config for setting response type
  • Add WAU chart in the analytics page
  • Starting with acryl_datahub==0.8.13.1 (pypi), Looker and Lookml ingestion will now name views differently from before. You will need to delete old LookML metadata to start with a clean slate or specify view_naming_pattern = “{name}” in both your Looker and LookML ingestion recipes to get the old behavior.
  • Populate the user email field in usage statistics to correctly show top users on the entity page
  • Full changelog below

Changelog

DataHub v0.8.13

Released on Wed Sep 15 2021 by @shirshanka.

Release Highlights

  • Support for aggregated statistics wrt the timeseries aspect. Moved usage stats functionality to use the new framework.
  • Auto-ingest common data platforms on GMS boot! No more generic logos.
  • Fixes re-ingestion of modified policies at startup
  • Full changelog below

Breaking Changes

  • Usage stats endpoint now uses the time-series aspect index in Elastic, meaning that statistics ingested previously will be lost. Please re-run usage ingestion (e.g. bigquery-usage / snowflake-usage) etc. to backfill your usage statistics history.

Changelog

DataHub v0.8.12

Released on Thu Sep 09 2021 by @shirshanka.

Release Highlights

  • RBAC Phase 1: Added abilities to control access through policies in the UI and backend
  • Dataset page refresh!!! + improved home page, search and browse screens
  • Added the ability to monitor DataHub through Prometheus and provided example Grafana dashboards
  • GraphQL API browser hosted on /api/graphql endpoint.
  • Support for Business Glossary ingestion through yml file
  • Support for Azure AD ingestion source

Notable Changes

  • Fixed unicode rendering bug introduced in v0.8.11
  • Added the ability to search by properties in the customProperties bag: supports case-insensitive matches of the form ‘key=value’
    • For instance, query “encoding=utf-8” will return entities with “encoding”: “utf-8” in the property bag
  • Full changelog below

Changelog