Skip to main content
Version: Next

DataHub Releases

Summary

VersionRelease DateLinks
v0.13.22024-04-16Release Notes, View on GitHub
v0.13.12024-04-02Release Notes, View on GitHub
v0.13.02024-02-29Release Notes, View on GitHub
v0.12.12023-12-08View on GitHub
v0.12.02023-10-25View on GitHub
v0.11.02023-09-08View on GitHub
v0.10.52023-08-02View on GitHub
v0.10.42023-06-09View on GitHub
v0.10.32023-05-25View on GitHub
v0.10.22023-04-13View on GitHub
v0.10.12023-03-23View on GitHub
v0.10.02023-02-07View on GitHub
v0.9.6.12023-01-31View on GitHub
v0.9.62023-01-13View on GitHub
v0.9.52022-12-23View on GitHub
v0.9.42022-12-20View on GitHub
v0.9.32022-11-30View on GitHub
v0.9.22022-11-04View on GitHub
v0.9.12022-10-31View on GitHub
v0.9.02022-10-11View on GitHub
v0.8.452022-09-23View on GitHub
v0.8.442022-09-01View on GitHub
v0.8.432022-08-09View on GitHub
v0.8.422022-08-03View on GitHub
v0.8.412022-07-15View on GitHub
v0.8.402022-06-30View on GitHub
v0.8.392022-06-24View on GitHub
v0.8.382022-06-09View on GitHub
v0.8.372022-06-09View on GitHub

v0.13.2

Released on 2024-04-16 by @david-leifker.

Hotfix Release

Fixes MCL message deserialization bug when using internal schema registry and running specific upgrade jobs.

policyFields (enabled by default): BOOTSTRAP_SYSTEM_UPDATE_POLICY_FIELDS_ENABLED:true

dataJobNodeCLL (disabled by default): BOOTSTRAP_SYSTEM_UPDATE_DATA_JOB_NODE_CLL_ENABLED:false

Example Error:

Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id 1
Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 13 out of bounds for length 2
at org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)

Recovery Directions:

If currently affected, please remove the topic prior to upgrading to v0.13.2 to remove the corrupted message. The default topic name is MetadataChangeLog_Versioned_v1 however if you've customized the topic name be sure to remove that topic.

If running kafka per the example Helm chart for prerequisites the following command will delete the topic.

kubectl exec -it prerequisites-kafka-broker-0 -c kafka -- kafka-topics.sh --bootstrap-server localhost:9092 --delete --topic MetadataChangeLog_Versioned_v1

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.1...v0.13.2

v0.13.1

Released on 2024-04-02 by @david-leifker.

DataHub Release Notes

User Experience

  • Capture and Manage Common Joins between Datasets: Users can now view and manage common join relationships between datasets, making it easier than ever to capture best practices and bespoke join logic. Watch the walkthrough here! 8325
    • Head's up: you'll need to enable the ER_MODEL_RELATIONSHIP_FEATURE_ENABLED env variable to use this feature!
  • Enhanced UI Interactions: Users can now enjoy an improved markdown editor and filter policies by active/inactive statuses, resulting in a more intuitive and manageable interface. 9949, 9958
  • Visual Context for Groups: You can now include picture links for groups in the UI, adding a richer visual context and enhancing the navigational experience. 9882
  • Improved Error Visibility: The UI now displays error messages related to data size limitations, allowing for better troubleshooting and user experience. 10038

Developer Experience

  • Enhanced Kafka Compatibility: Updated client version for Kafka setup ensures better compatibility and functionality for developers. 9962
  • Optimized Docker Build: Docker setups now respect pip mirrors, optimizing the build process especially in restricted network environments. 9963
  • Advanced Error Handling: New error handling for duplicate class names and improved fspath lint error management enhance the code reliability and quality. 9960, 9976
  • Latest OpenSearch Image: Incorporation of OpenSearch image version 2.11.0 aligns with the latest stable releases, boosting performance and security. 9984

Metadata Ingestion

  • NEW: Dagster Integration: You can now seamlessly ingest your Dagter Pipelines, Jobs, Ops, and lineage into DataHub. 10071
  • Expanded Field Classification Support: This release introduces support for field-level classification during ingestion for Redshift, BigQuery, DynamoDB, and SQL Sources. 10013, 10031
  • Enhanced Ingestion Capabilities: DataHub now offers stateful ingestion by default, optimizing routines for REST sinks and improving metadata accuracy across diverse sources like dbt and BigQuery. 9934, 10158, 10080
  • Better Data Lineage: This release introduced support for Openlineage in service of the Spark Lineage Beta Plugin; additionally, we now support incremental Column-Level Lineage, improving the accuracy of detecting column-level relationships during ingestion.9870, 9967, 10090
  • Schema Clarity: New descriptions support for JSON schema arrays and a mechanism to escape special characters in BigQuery table descriptions aid in clearer schema validation and ingestion processes. Databricks ingestion now supports Hive Metastore schemas with special characters. 9757, 9932, 10049

Version Upgrades

  • Kafka client and OpenSearch image were updated to the latest versions.

Breaking Changes

This release introduces default settings for stateful ingestion and updates in handling dbt ingestion. For details on all breaking changes, view the full documentation here.

Contributors

MASSIVE shoutout to our contributors!

First-Time Contributors

akarsh991, alexs-101, AvaniSiddhapuraAPT, diegmonti, dushayntAW, filipe-caetano-ovo, HuanjieGuo, jayacryl, k7ragav, kopax-polyconseil, LePuppy, Nelvin73, pinakipb2, poorvi767, rae89, trialiya, valeral.

Repeat Contributors

ANich, shubhamjagtap639, sgomezvillamor, siladitya2, skrydal, sumitappt, Masterchen09, mayurinehate, ngamanda, gaurav2733, githendrik, jayasimhankv.

DataHub Maintainers

anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, pedro93, RyanHolstien, treff7es, yoonhyejin.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.13.0...v0.13.1

v0.13.0

Released on 2024-02-29 by @RyanHolstien.

DataHub v0.13.0 Release Notes Summary

User Experience

  • NEW - Asset Documentation Forms & UI-Editable Properties: Define specific documentation requirements via a Form, and empower your asset owners to capture their valuable knowledge via UI-Editable Properties. Watch the demo here!
  • NEW - DataHub Incidents: Create, communicate, and data quality and observability incidents when they inevitably arise. Watch the demo here! UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience.
  • UI Improvements: Editing secrets, handling forms, and rendering token pages and lineage diagrams have been improved for a smoother user interface experience.

Developer Experience

  • Security Upgrades: Core dependencies like shiro-core and FastAPI have been upgraded to fix vulnerabilities, ensuring a safer development environment.
  • GraphQL/OpenAPI Enhancements: New GraphQL endpoints and better OpenAPI documentation provide more powerful tools for API interaction, making developers' jobs easier.
  • Performance Tuning: Backend improvements for search operations and ingestion processes make the platform faster and more reliable.

Metadata Ingestion

  • Platform Integrations: Enhanced support for dbt, Metabase, BigQuery, AWS Glue, Oracle, and Redshift allows for more comprehensive metadata capture, making integration with these platforms smoother.
  • Ingestion Framework: The reliability of ingestion has been improved, with new capabilities like support for tags from Tableau datasources and compatibility with Airflow 2.5.0, facilitating a broader range of data synchronization tasks.
  • Connector Improvements: Ingestion connectors for external data tools have been streamlined, ensuring easier integration and data synchronization.

Other Improvements and Fixes

  • Enhanced internal testing frameworks with Cypress and pytest-random-order for ingestion tests.
  • Simplified developer workflows with configurable Docker Compose project names in CLI.
  • Addressed various ingestion-related bugs for platforms like Feast and Snowflake.
  • Enhanced the UI codebase with TypeScript compilation linting and updated styles.
  • Streamlined CI processes for pull requests and linting conditions.
  • Version Upgrades: Upgraded pytest-docker, Pegasus, and SQLglot, among others, to improve stability and performance. Security vulnerabilities addressed by upgrading FastAPI, gitdb, and follow-redirects.

Notable Breaking Changes

  • Updates to MySQL version for quickstarts and migration to Neo4j 5.x may impact existing setups.
  • JDK17 build requirement and Docker Compose > 2.20 needed for building DataHub.
  • Python 3.8+ requirement for the acryl-datahub CLI.
  • Changes in Unity Catalog ingestion source configs and Redshift lineage generation.
  • Deprecation of Spark 2.x and associated JDK8 build requirements.

For full details on breaking changes, please visit DataHub's update guide.

Acknowledgements

A huge thank you to all our contributors for making this release possible. Your hard work and dedication are greatly appreciated.

First-Time Contributors

7onn, Adityamalik123, atjones0011, BlueHorn07, diegoreico, dim-ops, fer-marino, Gerrit-K, gp1105739, ilpianista, ingthorb, KaYunKIM, Kunal-kankriya, muzzacode, nnnkkk7, pankajmahato-visa, rubiojr, ryaminal, scalvanese452, sleeperdeep, stevenayers.

Repeat Contributors

allizex, arunvasudevan, cburroughs, feldjay, gaurav2733, iprentic, KulykDmytro, kushagra-apptware, mayurinehate, nmbryant, noggi, purnimagarg1, rinzool, sgomezvillamor, shubhamjagtap639, siddiquebagwan-gslab, siladitya2, skrydal, sumitappt, TonyOuyangGit, wngus606, yangjiandan, Salman-Apptware.

DataHub Maintainers

anshbansal, asikowitz, chriscollins3456, darnaut, david-leifker, eboneil, ethan-cartwright, gabe-lyons, hsheth2, jjoyce0510, maggiehays, pedro93, RyanHolstien, shirshanka, sid-acryl, treff7es, yoonhyejin.

What's Changed

New Contributors

Full Changelog: https://github.com/datahub-project/datahub/compare/v0.12.1...v0.13.0

DataHub v0.12.1

Released on 2023-12-08 by @david-leifker.

View the release notes for DataHub v0.12.1 on GitHub.

v0.12.1rc2

Released on 2023-11-28 by @david-leifker.

View the release notes for v0.12.1rc2 on GitHub.

v0.12.0

Released on 2023-10-25 by @pedro93.

View the release notes for v0.12.0 on GitHub.

v0.11.0

Released on 2023-09-08 by @iprentic.

View the release notes for v0.11.0 on GitHub.

v0.10.5

Released on 2023-08-02 by @david-leifker.

View the release notes for v0.10.5 on GitHub.

v0.10.4

Released on 2023-06-09 by @pedro93.

View the release notes for v0.10.4 on GitHub.

v0.10.3

Released on 2023-05-25 by @iprentic.

View the release notes for v0.10.3 on GitHub.

DataHub v0.10.2

Released on 2023-04-13 by @iprentic.

View the release notes for DataHub v0.10.2 on GitHub.

DataHub v0.10.1

Released on 2023-03-23 by @aditya-radhakrishnan.

View the release notes for DataHub v0.10.1 on GitHub.

DataHub v0.10.0

Released on 2023-02-07 by @david-leifker.

View the release notes for DataHub v0.10.0 on GitHub.

DataHub v0.9.6.1

Released on 2023-01-31 by @david-leifker.

View the release notes for DataHub v0.9.6.1 on GitHub.

DataHub v0.9.6

Released on 2023-01-13 by @maggiehays.

View the release notes for DataHub v0.9.6 on GitHub.

DataHub v0.9.5

Released on 2022-12-23 by @jjoyce0510.

View the release notes for DataHub v0.9.5 on GitHub.

[Known Issues] DataHub v0.9.4

Released on 2022-12-20 by @maggiehays.

View the release notes for [Known Issues] DataHub v0.9.4 on GitHub.

DataHub v0.9.3

Released on 2022-11-30 by @maggiehays.

View the release notes for DataHub v0.9.3 on GitHub.

DataHub v0.9.2

Released on 2022-11-04 by @maggiehays.

View the release notes for DataHub v0.9.2 on GitHub.

DataHub v0.9.1

Released on 2022-10-31 by @maggiehays.

View the release notes for DataHub v0.9.1 on GitHub.

DataHub v0.9.0

Released on 2022-10-11 by @szalai1.

View the release notes for DataHub v0.9.0 on GitHub.

DataHub v0.8.45

Released on 2022-09-23 by @gabe-lyons.

View the release notes for DataHub v0.8.45 on GitHub.

DataHub v0.8.44

Released on 2022-09-01 by @jjoyce0510.

View the release notes for DataHub v0.8.44 on GitHub.

DataHub v0.8.43

Released on 2022-08-09 by @maggiehays.

View the release notes for DataHub v0.8.43 on GitHub.

v0.8.42

Released on 2022-08-03 by @gabe-lyons.

View the release notes for v0.8.42 on GitHub.

v0.8.41

Released on 2022-07-15 by @anshbansal.

View the release notes for v0.8.41 on GitHub.

v0.8.40

Released on 2022-06-30 by @gabe-lyons.

View the release notes for v0.8.40 on GitHub.

v0.8.39

Released on 2022-06-24 by @maggiehays.

View the release notes for v0.8.39 on GitHub.

[!] DataHub v0.8.38

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.38 on GitHub.

[!] DataHub v0.8.37

Released on 2022-06-09 by @jjoyce0510.

View the release notes for [!] DataHub v0.8.37 on GitHub.