A complete solution for open data platforms, enterprise data catalogs, data lakes and data management. Open source, mature, fully-featured and production ready.

DataHub

Hey all does someone has experience / best practice with management of multiple environments? Currently we have 3 instances of datahub (dev/test/prod) but syncing metadata gets messier with every new awesome feature release. I wrote a python script exporting all metadata information like tags, terms, owner &amp; the metadata assigned to the datasets and for importing the exported information to another environment. Now there is manual lineage and views etc.
How do you manage this problem? Do you sync all environments? Do you just manage manual metadata in prod?

Hey Benedikt,
If you want the exact same metadata in these environments. Then you can set up database backups on the prod instance (postgres/mysql and neo4j/elasticsearch ) and restore the backups in different environments.

+1!! This is easier for sure.

However, I'd personally love to see a datahub source -&gt; datahub sink in our ingestion framework that could handle this seamlessly (e.g.  by running once each day). Since all documents are versioned internally, this is not very straightforward to do..

if a datahub to datahub sync was possible we could use it industry wide for data contracts across teams/businesses

Hi <@U038J2ST49F>, is there any documentation to write a Python script to export metadata? Thanks! I only can see those data in datahub db..

Hey <@U03TT6R08G2> thanks for your tip! A full sync would not be the best solution, since there could be a delta in Tables and Metainformation between the environments. I would suggest that this would also sync Artifacts like UI-Ingestion? These would also be different. Was wondering if we could just take some selected rows from the mysql dev db and put it to the other db without causing harm or with just the need of a small amount of transformation :smile:

<@U03PG7F4X5G> You can check out these two links
<https://datahubproject.io/docs/metadata-ingestion/as-a-library/>
<https://github.com/datahub-project/datahub/tree/master/metadata-ingestion/examples/library>
In this line for example you can see getting metainformation from datahub <https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/dataset_add_owner.py#L36>

I think we will try to just maintain metadata in the live environment for some time and see if it works for us

<@U038J2ST49F> you’ll need to restore indices on the URNs of the rows being placed into the new DB in order to have them show up