Skip to main content
Version: 0.15.0

DataHub Releases

Summary

VersionRelease DateLinks
v0.15.02025-01-15Release Notes, View on GitHub
v0.14.12024-09-17View on GitHub
v0.14.0.22024-08-21View on GitHub
v0.14.02024-08-13View on GitHub
v0.13.32024-05-23View on GitHub
v0.13.22024-04-16View on GitHub
v0.13.12024-04-02View on GitHub
v0.13.02024-02-29View on GitHub
v0.12.12023-12-08View on GitHub
v0.12.02023-10-25View on GitHub
v0.11.02023-09-08View on GitHub
v0.10.52023-08-02View on GitHub
v0.10.42023-06-09View on GitHub
v0.10.32023-05-25View on GitHub
v0.10.22023-04-13View on GitHub
v0.10.12023-03-23View on GitHub
v0.10.02023-02-07View on GitHub
v0.9.6.12023-01-31View on GitHub
v0.9.62023-01-13View on GitHub
v0.9.52022-12-23View on GitHub
v0.9.42022-12-20View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub

V0.15.0

Released on 2025-01-15 by @pedro93.

DataHub v0.15.0 Release Notes

User Experience

  • Structured Properties

    • Added comprehensive support for managing structured properties, including creation, editing, deletion, and display preferences. Introduced timestamps for tracking creation and modification. [#12100, #11419]
    • Enhanced property display options with badge styling, custom column types, and configurable visibility settings in asset sidebars and schema fields. [#12111, #12052]
    • Added structured property filtering in UI with improved aggregation logic and entity metadata display. Introduced new property validators and display settings. [#12097, #12099]
  • UI Enhancements

    • Enhanced container organization with parent hierarchy labels. [#11705]
    • Added support for markdown in incident descriptions, enabling rich formatting capabilities. [#11759]
    • Improved ingestion reporting with better visibility of successful ingestions with warnings. Enhanced browse paths display for business attributes and schema fields. [#11704, #11585]
    • Added support for timeseries aspects in OpenAPI and customizable date range fields for Analytics charts. [#12096, #11366]
  • Authorization & Authentication

    • Enabled authentication and API authorization by default, with support for URN-wildcard-based policies using STARTS_WITH condition. [#11484, #11441]
    • Added authorization checks for managing Glossary terms, including privileges for ownership, domain management, and link actions. [#11337]

Metadata Ingestion

Ingestion Framework Improvements
  • Enhanced Data Source Support: Expanded ingestion capabilities for multiple platforms, including Superset (with dataset entities, schema fields, and column-level lineage), Feast (supporting tags and owners ingestion), Neo4j, and Cassandra. Added stateful ingestion support for file sources. [#11688, #11784, #11804, #11526, #11822]

  • SQL Processing Improvements: Replaced vulnerable sqlparse dependency with an in-house SQL parser, optimized CLL generation with reduced memory usage, and added special handling for MSSQL case sensitivity. Enhanced multi-query lineage support for Snowflake temporary tables. [#11645, #11708, #11920, #12020]

  • CLI Enhancements: Introduced new commands for managing ingestion, including listing source runs with filtering capabilities, undoing soft deletes with platform filtering, and listing structured properties. Added an offline flag to the SQL parser CLI. [#11740, #11980, #12012, #12283, #11635]

  • Ownership and Metadata Management: Extended ownership transformer capabilities across entities, improved glossary sync to preserve custom ownership types, and added support for multiple ownership types in glossaries and terms. Enhanced Forms CLI with additional filters for subtypes, platform instances, owners, tags, and glossary terms. [#11700, #11545, #12050, #10979]

  • Core Infrastructure Improvements: Implemented unique URN generation for all entities, added support for efficient entity ingestion through get_entity_as_mcps, improved empty field handling, and introduced progress reporting during ingestion. Added execution request cleanup job and support for dropping duplicate schema fields. [#11676, #11425, #11613, #12117, #11765, #12308]

Source-Specific Ingestion Improvements

Airflow

  • Upgraded infrastructure with support for Airflow 2.10, deprecated versions below 2.3, and improved template handling with Jinja support. Added configuration options for dag patterns and environment variables. [#11300, #11371, #11472, #11537, #11579, #12056]
  • Enhanced error handling and debugging with improved logging, fixed plugin stability issues on EMR, and added support for AthenaOperator lineage extraction. Introduced ability to disable plugin without restart. [#11857, #11877, #11880, #12098]

BigQuery

  • Enhanced data modeling capabilities with support for foreign/primary keys, BigLake tables, and improved handling of external tables. Added support for region qualifiers and partition management. [#11686, #11728, #11874, #11940]
  • Improved lineage tracking with GCS data source support and optimized query performance. Added platform resource entity generation from BigQuery labels. [#11442, #11492, #11534, #11602]
  • Enhanced profiling and performance with better type handling and size limits. Fixed issues with tag synchronization and platform instance settings. [#11807, #12060]

Dagster

  • Added support for skipping Asset ingestion, fixed input/output value formatting, and improved compatibility with latest Dagster versions (v1.9.6). Deprecated Python 3.8 support. [#11262, #11481, #12121, #12189]

dbt

  • Improved performance and functionality with node_name_patterns for faster CLL processing, support for multiple test paths, and better handling of custom owner types. [#11450, #11460, #11848]
  • Enhanced lineage handling by preventing cycles in SQL parsing and supporting multiple dataset assertions for tests. Added support for dbt Cloud's Explore page. [#11666, #11451, #12223]

Snowflake

  • Expanded support for various table types, including secure, dynamic, and hybrid tables. Enhanced lineage capabilities for renames, swaps, and external tables. [#11600, #12039, #12094, #12179]
  • Improved authentication with OAuth support and token management. Added incremental property processing and structured property support for tags. [#11888, #12048, #12080, #12285]
  • Enhanced error handling and logging with better parse failure reporting and dot handling in table names. [#12105, #12110, #12153]

Tableau

  • Enhanced project management with new path pattern filtering and improved handling of hidden assets. Added support for access roles and group permissions. [#10855, #11157, #11559]
  • Improved API integration with retry logic for various error codes (502, 504), better authentication handling, and consistent page size application. [#12213, #12216, #12233]
  • Enhanced reporting and debugging capabilities while maintaining efficient performance and proper permission handling. [#12015, #12024, #12175]

PowerBI

  • Improved M-query parsing with support for comments, better handling of quotes, and DatabricksMultiCloud native query functionality. [#12177, #11743, #11756]
  • Enhanced workspace management with cross-workspace dataset linking and app ingestion support. Added timeouts for M-query parsing. [#11560, #11629, #11753]
  • Improved error reporting and performance optimization with reduced type casting and better organization of responsibilities. [#11763, #12004]

Developer Experience

  • Entity Management: Introduced entity versioning for Datasets and ML Models, with support for version set linking. Improved timeline functionality with better handling of primary key changes and rename events. Added data transformation logic models to enhance data processing capabilities. [#11819, #11843, #12166, #12198]

  • Enhanced Configuration Management: Added new customization options through environment variables and Helm charts, including editable dataset names and configurable garbage collection scheduling. The bootstrap process has been optimized to reduce latency during installation. [#11391, #11518]

  • Development Environment Updates: Added Git support to the ingestion-base image, enabling better source control integration for ingestion workflows. [#11477]

  • Security Logging Enhancement: Improved security audit trails by adding actor URN tracking for unauthorized access attempts. [#12030]

NEW: Garbage Collection
  • Comprehensive Metadata Cleanup: Introduced a new ingestion source: DataHubGC to function as a garbage collector for managing dataflows, data jobs, and data process instances, with configurable retention policies and deletion parameters. Added dry run mode for testing cleanup operations. [#11102, #11413]

  • Performance Optimizations: Significantly improved processing speed from 1 hour to 15 minutes by implementing batch processing, optimizing queries, and removing unnecessary operations. Increased default hard delete limit from 10k to 25k entities. [#11809, #12093, #12238]

  • Reliability Improvements: Enhanced garbage collection stability with additional validation checks, improved error handling, and better process visibility through ingestion stage reporting. Fixed issues with entity deletion logic and reference handling to preserve critical lineage relationships. [#12011, #12013, #12027, #12049, #12124, #12226]

Thank You to Our Contributors!

First-Time Contributors

@AColocho, @alberttwong, @Alice-608, @Bumyu, @chakru-r, @chriscc2, @dejan2609, @donovan-acryl, @eagle-25, @hwmarkcheng, @k-bartlett, @kanavnarula, @kartikey-visa, @kevinkarchacryl, @kousiknandy, @kris48k, @llance, @margaridafernandes-trip, @mikeburke24, @raudzis, @ronybony1990, @ryota-cloud, @shepherd44, @siong-tcha, @ssidorenko, @tanguyantoine, @th0ger, @udays-visa, @udbhav-hbk, @vejeta

Repeat Contributors

@aviv-julienjehannet, @bda618, @bossenti, @darnaut, @deepgarg-visa, @DSchmidtDev, @dushayntAW, @eboneil, @ethan-cartwright, @feldjay, @githendrik, @haeniya, @Jorricks, @Masterchen09, @mkamalas, @Nbagga14, @nicholas-fwang, @noggi, @pankajmahato-visa, @pinakipb2, @rtekal, @sagar-salvi-apptware, @steffengr

DataHub Maintainers

@acrylJonny, @anshbansal, @asikowitz, @chriscollins3456, @david-leifker, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @mayurinehate, @pedro93, @RyanHolstien, @sakethvarma397, @sgomezvillamor, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.14.1...v0.15.0

v0.14.1

Released on 2024-09-17 by @david-leifker.

View the release notes for v0.14.1 on GitHub.

v0.14.0.2

Released on 2024-08-21 by @RyanHolstien.

View the release notes for v0.14.0.2 on GitHub.

v0.14.0

Released on 2024-08-13 by @RyanHolstien.

View the release notes for v0.14.0 on GitHub.

v0.13.3

Released on 2024-05-23 by @david-leifker.

View the release notes for v0.13.3 on GitHub.

v0.13.2

Released on 2024-04-16 by @david-leifker.

View the release notes for v0.13.2 on GitHub.

v0.13.1

Released on 2024-04-02 by @david-leifker.

View the release notes for v0.13.1 on GitHub.

v0.13.0

Released on 2024-02-29 by @RyanHolstien.

View the release notes for v0.13.0 on GitHub.

DataHub v0.12.1

Released on 2023-12-08 by @david-leifker.

View the release notes for DataHub v0.12.1 on GitHub.

v0.12.1rc2

Released on 2023-11-28 by @david-leifker.

View the release notes for v0.12.1rc2 on GitHub.

v0.12.0

Released on 2023-10-25 by @pedro93.

View the release notes for v0.12.0 on GitHub.

v0.11.0

Released on 2023-09-08 by @iprentic.

View the release notes for v0.11.0 on GitHub.

v0.10.5

Released on 2023-08-02 by @david-leifker.

View the release notes for v0.10.5 on GitHub.

v0.10.4

Released on 2023-06-09 by @pedro93.

View the release notes for v0.10.4 on GitHub.

v0.10.3

Released on 2023-05-25 by @iprentic.

View the release notes for v0.10.3 on GitHub.

DataHub v0.10.2

Released on 2023-04-13 by @iprentic.

View the release notes for DataHub v0.10.2 on GitHub.

DataHub v0.10.1

Released on 2023-03-23 by @aditya-radhakrishnan.

View the release notes for DataHub v0.10.1 on GitHub.

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

View the release notes for DataHub v0.10.0 on GitHub.

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

View the release notes for DataHub v0.9.6.1 on GitHub.

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.

View the release notes for DataHub v0.9.6 on GitHub.

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

View the release notes for DataHub v0.9.5 on GitHub.

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.