Here is DataHub's roadmap for the next six months (starting Jan 2021).
We publish only a short roadmap, because we are evolving very fast and want to adapt to the community's needs. We will be checking off against this roadmap as we make progress over the next few months.
Caveat: ETA-s are subject to change. Do let us know before you commit to your stakeholders about deploying these capabilities at your company.
- Build a new UI based on React
- Deprecate open-source support for Ember UI
- Build a Python-based Ingestion Framework
- Support common people repositories (LDAP)
- Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
- Support common transformation sources (dbt, Looker)
- Support for push-based metadata emission from Python (e.g. Airflow DAGs)
- Support for dashboard and chart entity page
- Support browse, search and discovery
- Support for Authentication (login) using OIDC providers (Okta, Google etc)
Use-Case: Support for free-form global tags for social collaboration and aiding discovery
- Edit / Create new tags
- Attach tags to relevant constructs (e.g. datasets, dashboards, users, schema_fields)
- Search using tags (e.g. find all datasets with this tag, find all entities with this tag)
- Support for business glossary model (definition + storage)
- Browse taxonomy
- UI support for attaching business terms to entities and fields
Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand lineage with datasets
- Support for Metadata Models + Backend Implementation
- Metadata Integrations with systems like Airflow.
Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)
- Support for data profiling and preview extraction through ingestion pipeline
- Out of scope for Q1: Access control of data profiles and sample data
- Production-grade Helm charts for Kubernetes-based deployment
- How-to guides for deploying DataHub to all the major cloud providers
- Support for data quality visualization
- Support for data health score based on data quality results and pipeline observability
- Integration with systems like Great Expectations, AWS deequ etc.
- Helping you understand how your users are interacting with DataHub
- Integration with common systems like Google Analytics etc.
- Display frequently used datasets, dashboards
- Improved search relevance through usage data
- Support for fine-grained access control for metadata operations (read, write, modify)
- Scope: Access control on entity-level, aspect-level and within aspects as well.
- This provides the foundation for Tag Governance, Dataset Preview access control etc.
Use Case: Developers should be able to add new entities and aspects to the metadata model easily
- No need to write any code (in Java or Python) to store, retrieve, search and query metadata
- No need to write any code (in GraphQL or UI) to visualize metadata