Skip to main content

A Metadata Platform for the Modern Data Stack

Data ecosystems are diverse — too diverse. DataHub's extensible metadata platform enables data discovery, data observability and federated governance that helps you tame this complexity.

Get Started →Join our Slack
Open Source

Open Source

DataHub was originally built at LinkedIn and subsequently open-sourced under the Apache 2.0 License. It now has a thriving community with over a hundred contributors, and is widely used at many companies.

Forward Looking Architecture

Forward Looking Architecture

DataHub follows a push-based architecture, which means it's built for continuously changing metadata. The modular design lets it scale with data growth at any organization, from a single database under your desk to multiple data centers spanning the globe.

Massive Ecosystem

Massive Ecosystem

DataHub has pre-built integrations with your favorite systems: Kafka, Airflow, MySQL, SQL Server, Postgres, LDAP, Snowflake, Hive, BigQuery, and many others. The community is continuously adding more integrations, so this list keeps getting longer and longer.

ADLSAirflowAthenaBigQueryCouchBaseDBTDruidFeastGlueHadoopHiveKafkaKustoLookerMongoDBMSSQLMySQLOraclePinotPostgreSQLPrestoRedashRedshiftS3SageMakerSnowflakeSparkSupersetTeradataADLSAirflowAthenaBigQueryCouchBaseDBTDruidFeastGlueHadoopHiveKafkaKustoLookerMongoDBMSSQLMySQLOraclePinotPostgreSQLPrestoRedashRedshiftS3SageMakerSnowflakeSparkSupersetTeradata

How it Works

Automated Metadata Ingestion

Push-based ingestion can use a prebuilt emitter or can emit custom events using our framework.

Pull-based ingestion crawls a metadata source. We have prebuilt integrations with Kafka, MySQL, MS SQL, Postgres, LDAP, Snowflake, Hive, BigQuery, and more. Ingestion can be automated using our Airflow integration or another scheduler of choice.

Learn more about metadata ingestion with DataHub in the docs.

recipe.yml
source:
type: "mysql"
config:
username: "datahub"
password: "datahub"
host_port: "localhost:3306"
sink:
type: "datahub-rest"
config:
server: 'http://localhost:8080'
datahub ingest -c recipe.yml

Discover Trusted Data

Browse and search over a continuously updated catalog of datasets, dashboards, charts, ML models, and more.

Understand Data in Context

DataHub is the one-stop shop for documentation, schemas, ownership, lineage, pipelines and usage information. Data quality and data preview information coming soon.


Trusted Across the Industry

LinkedInExpedia GroupSaxo BankGrofersTypeformPelotonSpotHeroGeotabThoughtWorksViasatKlarnaWoltDFDSBankSaladUpholdLinkedInExpedia GroupSaxo BankGrofersTypeformPelotonSpotHeroGeotabThoughtWorksViasatKlarnaWoltDFDSBankSaladUphold

Managed DataHub

Acryl Data delivers an easy to consume DataHub platform for the enterprise

Sign up for Managed DataHub →