To deploy a new instance of DataHub, perform the following steps.
Launch the Docker Engine from command line or the desktop app.
Install the DataHub CLI
a. Ensure you have Python 3.6+ installed & configured. (Check using
b. Run the following commands in your terminal
python3 -m pip install --upgrade pip wheel setuptoolspython3 -m pip uninstall datahub acryl-datahub || true # sanity check - ok if it failspython3 -m pip install --upgrade acryl-datahubdatahub version
If you see "command not found", try running cli commands with the prefix 'python3 -m' instead:
python3 -m datahub version
To deploy DataHub, run the following CLI command from your terminal
datahub docker quickstart
Upon completion of this step, you should be able to navigate to the DataHub UI at http://localhost:9002 in your browser. You can sign in using
datahubas both the username and password.
To ingest the sample metadata, run the following CLI command from your terminal
datahub docker ingest-sample-data
That's it! To start pushing your company's metadata into DataHub, take a look at the Metadata Ingestion Framework.
To cleanse DataHub of all of it's state (e.g. before ingesting your own), you can use the CLI
datahub docker nuke
If running the datahub cli produces "command not found" errors inside your terminal, your system may be defaulting to an older
version of Python. Try prefixing your
datahub commands with
python3 -m datahub docker quickstart
There can be misc issues with Docker, like conflicting containers and dangling volumes, that can often be resolved by pruning your Docker state with the following command. Note that this command removes all unused containers, networks, images (both dangling and unreferenced), and optionally, volumes.
docker system prune