A Generalized Metadata Search & Discovery Tool
Data ecosystems are diverse — too diverse. DataHub is a data discovery application built on an extensible metadata platform that helps you tame this complexity.

Open Source
DataHub was originally built at LinkedIn and subsequently open-sourced under the Apache 2.0 License. It now has a thriving community with over 75 contributors, and many organizations are trying out or already using DataHub internally.
Forward Looking Architecture
DataHub follows a push-based architecture, which means it's built for continuously changing metadata. The modular design lets it scale with data growth at any organization, from a single database under your desk to multiple data centers spanning the globe.
Massive Ecosystem
DataHub has pre-built integrations with your favorite systems: Kafka, MySQL, SQL Server, Postgres, LDAP, Snowflake, Hive, BigQuery, and many others. The community is continuously adding more integrations, so this list keeps getting longer and longer.
Trusted Across the Industry
How does it work?
Automated Metadata Ingestion
Push-based ingestion can use a prebuilt emitter or can emit custom events using our framework.
Pull-based ingestion crawls a metadata source. We have prebuilt integrations with Kafka, MySQL, MS SQL, Postgres, LDAP, Snowflake, Hive, BigQuery, and more. Ingestion can be automated using our Airflow integration or another scheduler of choice.
DataHub's push-based architecture also supports pull, but pull-first systems cannot support push. Learn more about metadata ingestion with DataHub in the docs.

Discover Trusted Data
Browse and search over a continuously updated catalog of datasets, dashboards, charts, ML models, and more.
Understand Data in Context
DataHub is the one-stop shop for documentation, schemas, ownership, and lineage. Pipelines, usage, and quality information coming soon.
