Deploying Datahub with Kubernetes


This directory provides the Kubernetes Helm charts for deploying Datahub and it's dependencies (Elasticsearch, Neo4j, MySQL, and Kafka) on a Kubernetes cluster.


  1. Set up a kubernetes cluster
  2. Install the following tools:
    • kubectl to manage kubernetes resources
    • helm to deploy the resources based on helm charts. Note, we only support Helm 3.


Datahub consists of 4 main components: GMS, MAE Consumer, MCE Consumer, and Frontend. Kubernetes deployment for each of the components are defined as subcharts under the main Datahub helm chart.

The main components are powered by 4 external dependencies:

  • Kafka
  • Local DB (MySQL, Postgres, MariaDB)
  • Search Index (Elasticsearch)
  • Graph Index (Supports only Neo4j)

The dependencies must be deployed before deploying Datahub. We created a separate chart for deploying the dependencies with example configuration. They could also be deployed separately on-prem or leveraged as managed services.


Assuming kubectl context points to the correct kubernetes cluster, first create kubernetes secrets that contain MySQL and Neo4j passwords.

kubectl create secret generic mysql-secrets --from-literal=mysql-root-password=datahub
kubectl create secret generic neo4j-secrets --from-literal=neo4j-password=datahub

The above commands sets the passwords to "datahub" as an example. Change to any password of choice.

Second, deploy the dependencies by running the following

(cd prerequisites && helm dep update)
helm install prerequisites prerequisites/

Note, after changing the configurations in the values.yaml file, you can run

helm upgrade prerequisites prerequisites/

To just redeploy the dependencies impacted by the change.

Run kubectl get pods to check whether all the pods for the dependencies are running. You should get a result similar to below.

elasticsearch-master-0 1/1 Running 0 62m
elasticsearch-master-1 1/1 Running 0 62m
elasticsearch-master-2 1/1 Running 0 62m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv 2/2 Running 1 63m
prerequisites-kafka-0 1/1 Running 2 62m
prerequisites-mysql-0 1/1 Running 1 62m
prerequisites-neo4j-community-0 1/1 Running 0 52m
prerequisites-zookeeper-0 1/1 Running 0 62m

deploy Datahub by running the following

helm install datahub datahub/

Values in values.yaml have been preset to point to the dependencies deployed using the prerequisites chart with release name "prerequisites". If you deployed the helm chart using a different release name, update the quickstart-values.yaml file accordingly before installing.

Run kubectl get pods to check whether all the datahub pods are running. You should get a result similar to below.

datahub-datahub-frontend-84c58df9f7-5bgwx 1/1 Running 0 4m2s
datahub-datahub-gms-58b676f77c-c6pfx 1/1 Running 0 4m2s
datahub-datahub-mae-consumer-7b98bf65d-tjbwx 1/1 Running 0 4m3s
datahub-datahub-mce-consumer-8c57d8587-vjv9m 1/1 Running 0 4m2s
datahub-elasticsearch-setup-job-8dz6b 0/1 Completed 0 4m50s
datahub-kafka-setup-job-6blcj 0/1 Completed 0 4m40s
datahub-mysql-setup-job-b57kc 0/1 Completed 0 4m7s
elasticsearch-master-0 1/1 Running 0 97m
elasticsearch-master-1 1/1 Running 0 97m
elasticsearch-master-2 1/1 Running 0 97m
prerequisites-cp-schema-registry-cf79bfccf-kvjtv 2/2 Running 1 99m
prerequisites-kafka-0 1/1 Running 2 97m
prerequisites-mysql-0 1/1 Running 1 97m
prerequisites-neo4j-community-0 1/1 Running 0 88m
prerequisites-zookeeper-0 1/1 Running 0 97m

You can run the following to expose the frontend locally. Note, you can find the pod name using the command above. In this case, the datahub-frontend pod name was datahub-datahub-frontend-84c58df9f7-5bgwx.

kubectl port-forward <datahub-frontend pod name> 9002:9002

You should be able to access the frontend via http://localhost:9002.

Once you confirm that the pods are running well, you can set up ingress for datahub-frontend to expose the 9002 port to the public.

Other useful commands#

helm uninstall datahubRemove DataHub
helm lsList of Helm charts
helm historyFetch a release history