DataHub #getting-started

rapid-sundown-8805

07/07/2021, 1:18 PM

I have a question about this section from [the readme](https://datahubproject.io/docs/architecture/architecture):

Federated Metadata Serving

DataHub comes with a single metadata service (gms) as part of the open source repository. However, it also supports federated metadata services which can be owned and operated by different teams –– in fact that is how LinkedIn runs DataHub internally. The federated services communicate with the central search index and graph using Kafka, to support global search and discovery while still enabling decoupled ownership of metadata. This kind of architecture is very amenable for companies who are implementing data mesh.

Do you have an example architecture for this kind of setup? What is it about having a central metadata repository that goes against data mesh principles? Is it the downstream integrations (mce events etc.)?

mammoth-bear-12532

07/09/2021, 2:34 AM

Hi folks! Quick announcement: helm charts are now officially available at helm.datahubproject.io! As part of the move, we have separated the charts into a new repo: https://github.com/acryldata/datahub-helm to make them easier to manage. If you have forked these charts and need help with merges, let us know! Please ⭐ the new artifacthub page (https://artifacthub.io/packages/search?repo=datahub) and the new github repo to share your ❤️ for the project 🙏

🙌 2

steep-van-9393

07/09/2021, 9:09 AM

Is anyone else getting this on restarting the quickstart docker containers from the datahub-gms?

ambitious-airline-8020

07/09/2021, 11:38 AM

Mentioned bug seems to be here since 1 jun https://github.com/linkedin/datahub/issues/2639. I met the same, looks like repeatable flake. Just added some additional info to the issue, reagrding

mysql-setup

container logs - hope it helps

👍 1

sticky-television-18623

07/09/2021, 2:14 PM

I am attempting to use Oracle for the GMS data store and I am running into a type conversion error with EbeanAspectV2$PrimaryKey.version when executing the query in EbeanAspectDao.getNextVersion. On the database side the version column is defined as NUMBER(19,0) which I believe is the correct mapping for java long. Any thoughts on how to resolve this?

rich-policeman-92383

07/12/2021, 12:26 PM

Hello guys How can i check the version of datahub components. Ex: Is there an cli command or curl command to check version of datahub-gms and other datahub components.

crooked-toddler-8683

07/12/2021, 8:38 PM

Hello friends! Can someone help me with the installation? I am using Ubuntu. I've been able to reassure that my docker works as expected. I followed all the steps on https://datahubproject.io/docs/quickstart/ until 4th, which is throwing the error...

rich-policeman-92383

07/13/2021, 5:48 AM

Hi Guys Is it recommended to deploy 0.8.6 in production.

rapid-sundown-8805

07/13/2021, 7:48 AM

Hi again community, I have a question which I cannot find the answer to in the docs. Because of our ACL policies in Kafka, we would like to know if read access is enough for Data Hub on the MCE topic, or if it needs write to it too? Is it enough if it can read from MCE and write to MAE?

ambitious-airline-8020

07/13/2021, 8:29 AM

Hi All. Question about

Historical roadmap

- part

No-code Metadata Model Additions

- I see that

No need to write any code (in GraphQL or UI) to visualize metadata

is not checked while stay in

Historical part.

Does that mean

abandoned

, or just delayed?

brief-lizard-77958

07/13/2021, 8:46 AM

[Solved] Running gradlew build in a freshly pulled and running datahub on Ubuntu always results in the following error:

Task metadata ingestioninstallDev FAILED

FAILURE: Build failed with an exception. * What went wrong: Execution failed for task 'metadata ingestioninstallDev'.

Process 'command 'venv/bin/pip'' finished with non-zero exit value 1

Has anyone encountered a similar problem? Edit: I had to separately install python-ldap, which can't be standardly installed in Ubuntu (https://stackoverflow.com/a/4768467/7615751)

✅ 1

astonishing-yak-92682

07/13/2021, 4:22 PM

Can anyone pls help why my workflow result of PR https://github.com/linkedin/datahub/pull/2788/checks?check_run_id=3055771720 is showing There are uncommitted changes: While i cant see any uncommited change in my branch and git status --porcelain is also clean.

curved-magazine-23582

07/14/2021, 3:49 AM

hello team. I am upgrading our instance to latest version using the no code migration guide, but running into below issues:

Copy code

Starting upgrade with id NoCodeDataMigration...
Cleanup has not been requested.
Skipping Step 1/7: RemoveAspectV2TableStep...
Executing Step 2/7: GMSQualificationStep...
Completed Step 2/7: GMSQualificationStep successfully.
Executing Step 3/7: UpgradeQualificationStep...
-- V1 table exists
-- V1 table has 8011 rows
-- V2 table exists
-- V2 table has 2 rows
-- Since V2 table has records, we will not proceed with the upgrade.
-- If V2 table has significantly less rows, consider running the forced upgrade.
Failed to qualify upgrade candidate. Aborting the upgrade...
Step with id UpgradeQualificationStep requested an abort of the in-progress update. Aborting the upgrade...
Upgrade NoCodeDataMigration completed with result ABORTED. Exiting...

How do I do that recommended

forced upgrade

in this case?

jolly-honey-27198

07/14/2021, 8:06 AM

Hey, I wonder is there any method to deploy datahub offline or not via docker

acceptable-architect-70237

07/14/2021, 4:45 PM

Hi Team, not sure whether this question has been asked. What the best practice to keep track of Shard database and its schema and present them? I might have seen some samples but not sure.

better-orange-49102

07/15/2021, 6:21 AM

i know the data quality RFC is still in development, but just wondering, is the integration of tools like Great Expectations supposed to work with the existing python ingest framework? meaning, the ingest script will run DQ scripts as part of its metadata scraping process. Just want to know if it will change the way we do metadata ingestion.

square-activity-64562

07/15/2021, 5:41 PM

When using OIDC to login if I search using my first name there is nothing in search results. I thought my profile would be shown under users

square-activity-64562

07/15/2021, 5:44 PM

When using

global.datahub_standalone_consumers_enabled = true

then even if

datahub-mae-consumer.enabled = false

they get deployed. It will get confusing looking at these property names about what is supposed to be done. Should I keep

Copy code

global.datahub_standalone_consumers_enabled = false
datahub-mae-consumer.enabled = true
datahub-mce-consumer.enabled = true

or all 3 should be made true?

square-activity-64562

07/15/2021, 6:17 PM

How does

datahub ingest

command mentioned in https://datahubproject.io/docs/metadata-ingestion find datahub's kafka or rest endpoint? The use case is that I am thinking of running it via jenkins for now. Our jenkins will create a pod in K8s and run it. Jenkins will create a pod in jenkins namespace of our K8s cluster. Datahub is in apps namespace of our K8s cluster. So I am not sure how to configure datahub ingest so that it knows the location of datahub gms and frontend.

gifted-arm-43579

07/16/2021, 6:55 AM

hi everyone can i building datahub in windows os ?

ambitious-airline-8020

07/16/2021, 8:27 AM

Dear and favorite DataHub team! Could you please advice me, what is the best way to discuss this feature request https://github.com/linkedin/datahub/issues/2871? (Support search for map fields, like customProperties from DatasetProperties)

clean-furniture-99495

07/16/2021, 9:11 AM

Hi there! I was wondering if Datahub supports Json-schemas? We would like to provide our Segment Tracking Plan inside Datahub so we can improve the visibility about all our Fronted Events cross the company. If that’s possible, I will proceed on making a PR for a new acryl plugin integration between Segment Protocols and Datahub

👀 2

square-activity-64562

07/16/2021, 9:52 AM

In the quickstart of v0.8.6 (on local machine) I tried to add owner to a dataset which I was able to do. But if I search by owner name "aseem.bansal" there was no search results. But if I go http://localhost:9002/user/urn:li:corpuser:aseem.bansal/ownership the user was present.

square-activity-64562

07/16/2021, 2:04 PM

We had a use case which I was hoping to solve after we have a metadata store/data discovery tool up. Wanted to understand if this workflow can be done through datahub. We have different eng teams. Some for different business themes, countries etc. Each of them is managing their own databases (mostly RDS). Data team has access to read replicas of all of them. Any of the teams can change the schema in their own databases. Ideally, we would like the data team to be aware of any schema changes as soon as they happen. With the schema ingestion we will have history of schema changes which can be viewed if we go to individual dataset. Is there a way to have a running history of schema changes in a single place (excluding the first ingestion when we add these assets into datahub)? This can be a good tool for the whole data team to stay up-to-date with schema changes that various teams are doing.

square-activity-64562

07/16/2021, 2:06 PM

If I wanted to understand datahub's models and storage what all pages should I read other than this https://datahubproject.io/docs/metadata-modeling/metadata-model. This will help me in deploying/managing it in an easier way. e.g. currently I am stuck with schema registry. I wish to understand where exactly it fits. That will hopefully help me understand where to look for errors and maybe send a PR to fix things if I understand it well enough.

square-activity-64562

07/16/2021, 6:31 PM

The browse Path aspect explanation at https://datahubproject.io/docs/metadata-modeling/metadata-model/ can use some improvement in the example. Something that is being used in the UI

curved-magazine-23582

07/18/2021, 11:35 PM

Hello team, after upgrading to latest docker images, I ingested some PowerBI objects through GMS API, but browsing no longer works from UI. Ingestion is successful, as I can reach these objects by search. Think I've tried ingestion with BrowsePath and without BrowsePath. I don't see errors related to browsing in logs of UI, GMS and elasticsearch. where should I go next to figure this out. 🤔

Copy code

GMS logs:
17:12:07.872 [qtp544724190-3515] INFO  c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
17:12:07.875 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 3ms
17:12:07.882 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
17:12:08.359 [qtp544724190-3397] INFO  c.l.m.r.entity.EntityResource - BATCH GET [urn:li:corpuser:datahub]
17:12:08.363 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 4ms

salmon-cricket-21860

07/19/2021, 3:44 AM

Hi, can I customize the topic name

DataHubUsageEvent_v1

? Was able to modified other topic names but failed to change it even w/

DATAHUB_USAGE_EVENT_NAME

env variable. Seems ``DataHubUsageEvent_v1` is automatically created when user activities occur.

Copy code

DataHubUsageEvent_v1
catalog-datahub-fmce
catalog-datahub-mae
catalog-datahub-mce
catalog-datahub-usage # created by kafka-setup w/ `DATAHUB_USAGE_EVENT_NAME` ENV
__consumer_offsets
_schemas

square-activity-64562

07/21/2021, 7:16 PM

What timezone are the dates displayed in datahub?

some-microphone-33485

07/21/2021, 7:17 PM

Hello , question regarding password change , How to change default password for user "datahub" ? Thank you .