Services that integrate with DataHub
Airflow is an open-source data orchestration tool used for scheduling, monitoring, and managing complex data pipelines.
Athena is a serverless interactive query service that enables users to analyze data in Amazon S3 using standard SQL.
Azure AD is a cloud-based identity and access management tool that provides secure authentication and authorization for users and applications.
BigQuery is a cloud-based data warehousing and analytics tool that allows users to store, query, and analyze large datasets quickly and efficiently.
A source provided by DataHub for ingesting glossary metadata that provides a comprehensive list of business terms and definitions used within an organization.
ClickHouse is an open-source column-oriented database management system designed for high-performance data processing and analytics.
Databricks is a cloud-based data processing and analytics platform that enables data scientists and engineers to collaborate and build data-driven applications.
dbt is a data transformation tool that enables analysts and engineers to transform data in their warehouses through a modular, SQL-based approach.
Delta Lake is an open-source data lake storage layer that provides ACID transactions, schema enforcement, and data versioning for big data workloads.
Demo Data is a data tool that provides sample data sets for demonstration and testing purposes.
Elasticsearch is a distributed, open-source search and analytics engine designed for handling large volumes of data.
Feast is an open-source feature store that enables teams to manage, store, and discover features for machine learning applications.
File Based Lineage
File Based Lineage is a data tool that tracks the lineage of data files and their dependencies.
Glue is a data integration service that allows users to extract, transform, and load data from various sources into a data warehouse.
Great Expectations is an open-source data validation and testing tool that helps data teams maintain data quality and integrity.
Hive is a data warehousing tool that facilitates querying and managing large datasets stored in Hadoop Distributed File System (HDFS).
Iceberg is a data tool that allows users to manage and query large-scale data sets using a distributed architecture.
JSON Schemas is a data tool used to define the structure, format, and validation rules for JSON data.
Kafka is a distributed streaming platform that allows for the processing and storage of large amounts of data in real-time.
Kafka Connect is an open-source data integration tool that enables the transfer of data between Apache Kafka and other data systems.
LDAP (Lightweight Directory Access Protocol) is a data tool used for accessing and managing distributed directory information services over an IP network.
Looker is a business intelligence and data analytics platform that allows users to explore, analyze, and share data insights in real-time.
Metabase is an open-source business intelligence and data visualization tool that allows users to easily query and visualize their data.
Microsoft SQL Server
Microsoft SQL Server is a relational database management system designed to store, manage, and retrieve data efficiently and securely.
Mode is a cloud-based data analysis and visualization platform that enables businesses to explore, analyze, and share data in a collaborative environment.
MongoDB is a NoSQL database that stores data in flexible, JSON-like documents, making it easy to store and retrieve data for modern applications.
MySQL is an open-source relational database management system that allows users to store, organize, and retrieve data efficiently.
NiFi is a data integration tool that allows users to automate the flow of data between systems and applications.
Okta is a cloud-based identity and access management tool that enables secure and seamless access to applications and data across multiple devices and platforms.
Oracle is a relational database management system that provides a comprehensive and integrated platform for managing and analyzing large amounts of data.
Postgres is an open-source relational database management system that provides a powerful tool for storing, managing, and analyzing large amounts of data.
PowerBI is a business analytics service by Microsoft that provides interactive visualizations and business intelligence capabilities with an interface simple enough for end users to create their own reports and dashboards.
Presto is an open-source distributed SQL query engine designed for fast and interactive analytics on large-scale data sets.
Presto on Hive
Presto on Hive is a data tool that allows users to query and analyze large datasets stored in Hive using SQL-like syntax.
Protobuf Schemas is a data tool used for defining and serializing structured data in a compact and efficient manner.
Pulsar is a real-time data processing and messaging platform that enables high-performance data streaming and processing.
Redash is a data visualization and collaboration platform that allows users to connect and query multiple data sources and create interactive dashboards and visualizations.
Redshift is a cloud-based data warehousing tool that allows users to store and analyze large amounts of data in a scalable and cost-effective manner.
S3 Data Lake
S3 Data Lake is a cloud-based data storage and management tool that allows users to store, manage, and analyze large amounts of data in a scalable and cost-effective manner.
SageMaker is a data tool that provides a fully-managed platform for building, training, and deploying machine learning models at scale.
Salesforce is a cloud-based customer relationship management (CRM) platform that helps businesses manage their sales, marketing, and customer service activities.
SAP HANA is an in-memory data platform that enables businesses to process large volumes of data in real-time.
Snowflake is a cloud-based data warehousing platform that allows users to store, manage, and analyze large amounts of structured and semi-structured data.
Spark is a data processing tool that enables fast and efficient processing of large-scale data sets using distributed computing.
SQLAlchemy is a Python-based data tool that provides a set of high-level API for connecting to relational databases and performing SQL operations.
Superset is an open-source data exploration and visualization platform that allows users to create interactive dashboards and perform ad-hoc analysis on various data sources.
Tableau is a data visualization and business intelligence tool that helps users analyze and present data in a visually appealing and interactive way.
Trino is an open-source distributed SQL query engine designed to query large-scale data processing systems, including Hadoop, Cassandra, and relational databases.