https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • r

    rapid-sundown-8805

    07/07/2021, 1:18 PM
    I have a question about this section from [the readme](https://datahubproject.io/docs/architecture/architecture):
    Federated Metadata Serving
    DataHub comes with a single metadata service (gms) as part of the open source repository. However, it also supports federated metadata services which can be owned and operated by different teams –– in fact that is how LinkedIn runs DataHub internally. The federated services communicate with the central search index and graph using Kafka, to support global search and discovery while still enabling decoupled ownership of metadata. This kind of architecture is very amenable for companies who are implementing data mesh.
    Do you have an example architecture for this kind of setup? What is it about having a central metadata repository that goes against data mesh principles? Is it the downstream integrations (mce events etc.)?
    s
    b
    +2
    • 5
    • 9
  • m

    mammoth-bear-12532

    07/09/2021, 2:34 AM
    Hi folks! Quick announcement: helm charts are now officially available at helm.datahubproject.io! As part of the move, we have separated the charts into a new repo: https://github.com/acryldata/datahub-helm to make them easier to manage. If you have forked these charts and need help with merges, let us know! Please ⭐ the new artifacthub page (https://artifacthub.io/packages/search?repo=datahub) and the new github repo to share your ❤️ for the project 🙏
    🙌 2
    d
    e
    • 3
    • 3
  • s

    steep-van-9393

    07/09/2021, 9:09 AM
    Is anyone else getting this on restarting the quickstart docker containers from the datahub-gms?
    s
    b
    g
    • 4
    • 3
  • a

    ambitious-airline-8020

    07/09/2021, 11:38 AM
    Mentioned bug seems to be here since 1 jun https://github.com/linkedin/datahub/issues/2639. I met the same, looks like repeatable flake. Just added some additional info to the issue, reagrding
    mysql-setup
    container logs - hope it helps
    👍 1
    b
    • 2
    • 3
  • s

    sticky-television-18623

    07/09/2021, 2:14 PM
    I am attempting to use Oracle for the GMS data store and I am running into a type conversion error with EbeanAspectV2$PrimaryKey.version when executing the query in EbeanAspectDao.getNextVersion. On the database side the version column is defined as NUMBER(19,0) which I believe is the correct mapping for java long. Any thoughts on how to resolve this?
    g
    b
    • 3
    • 8
  • r

    rich-policeman-92383

    07/12/2021, 12:26 PM
    Hello guys How can i check the version of datahub components. Ex: Is there an cli command or curl command to check version of datahub-gms and other datahub components.
    b
    • 2
    • 4
  • c

    crooked-toddler-8683

    07/12/2021, 8:38 PM
    Hello friends! Can someone help me with the installation? I am using Ubuntu. I've been able to reassure that my docker works as expected. I followed all the steps on https://datahubproject.io/docs/quickstart/ until 4th, which is throwing the error...
    g
    g
    • 3
    • 29
  • r

    rich-policeman-92383

    07/13/2021, 5:48 AM
    Hi Guys Is it recommended to deploy 0.8.6 in production.
    m
    • 2
    • 3
  • r

    rapid-sundown-8805

    07/13/2021, 7:48 AM
    Hi again community, I have a question which I cannot find the answer to in the docs. Because of our ACL policies in Kafka, we would like to know if read access is enough for Data Hub on the MCE topic, or if it needs write to it too? Is it enough if it can read from MCE and write to MAE?
    b
    m
    • 3
    • 8
  • a

    ambitious-airline-8020

    07/13/2021, 8:29 AM
    Hi All. Question about
    Historical roadmap
    - part
    No-code Metadata Model Additions
    - I see that
    No need to write any code (in GraphQL or UI) to visualize metadata
    is not checked while stay in
    Historical part.
    Does that mean
    abandoned
    , or just delayed?
    b
    g
    • 3
    • 33
  • b

    brief-lizard-77958

    07/13/2021, 8:46 AM
    [Solved] Running gradlew build in a freshly pulled and running datahub on Ubuntu always results in the following error:
    Task metadata ingestioninstallDev FAILED
    FAILURE: Build failed with an exception. * What went wrong: Execution failed for task 'metadata ingestioninstallDev'.
    Process 'command 'venv/bin/pip'' finished with non-zero exit value 1
    Has anyone encountered a similar problem? Edit: I had to separately install python-ldap, which can't be standardly installed in Ubuntu (https://stackoverflow.com/a/4768467/7615751)
    ✅ 1
    b
    g
    • 3
    • 6
  • a

    astonishing-yak-92682

    07/13/2021, 4:22 PM
    Can anyone pls help why my workflow result of PR https://github.com/linkedin/datahub/pull/2788/checks?check_run_id=3055771720 is showing There are uncommitted changes: While i cant see any uncommited change in my branch and git status --porcelain is also clean.
    g
    • 2
    • 3
  • c

    curved-magazine-23582

    07/14/2021, 3:49 AM
    hello team. I am upgrading our instance to latest version using the no code migration guide, but running into below issues:
    Copy code
    Starting upgrade with id NoCodeDataMigration...
    Cleanup has not been requested.
    Skipping Step 1/7: RemoveAspectV2TableStep...
    Executing Step 2/7: GMSQualificationStep...
    Completed Step 2/7: GMSQualificationStep successfully.
    Executing Step 3/7: UpgradeQualificationStep...
    -- V1 table exists
    -- V1 table has 8011 rows
    -- V2 table exists
    -- V2 table has 2 rows
    -- Since V2 table has records, we will not proceed with the upgrade.
    -- If V2 table has significantly less rows, consider running the forced upgrade.
    Failed to qualify upgrade candidate. Aborting the upgrade...
    Step with id UpgradeQualificationStep requested an abort of the in-progress update. Aborting the upgrade...
    Upgrade NoCodeDataMigration completed with result ABORTED. Exiting...
    How do I do that recommended
    forced upgrade
    in this case?
    m
    b
    • 3
    • 23
  • j

    jolly-honey-27198

    07/14/2021, 8:06 AM
    Hey, I wonder is there any method to deploy datahub offline or not via docker
    b
    • 2
    • 1
  • a

    acceptable-architect-70237

    07/14/2021, 4:45 PM
    Hi Team, not sure whether this question has been asked. What the best practice to keep track of Shard database and its schema and present them? I might have seen some samples but not sure.
    g
    • 2
    • 3
  • b

    better-orange-49102

    07/15/2021, 6:21 AM
    i know the data quality RFC is still in development, but just wondering, is the integration of tools like Great Expectations supposed to work with the existing python ingest framework? meaning, the ingest script will run DQ scripts as part of its metadata scraping process. Just want to know if it will change the way we do metadata ingestion.
    b
    l
    • 3
    • 3
  • s

    square-activity-64562

    07/15/2021, 5:41 PM
    When using OIDC to login if I search using my first name there is nothing in search results. I thought my profile would be shown under users
    l
    b
    • 3
    • 2
  • s

    square-activity-64562

    07/15/2021, 5:44 PM
    When using
    global.datahub_standalone_consumers_enabled = true
    then even if
    datahub-mae-consumer.enabled = false
    they get deployed. It will get confusing looking at these property names about what is supposed to be done. Should I keep
    Copy code
    global.datahub_standalone_consumers_enabled = false
    datahub-mae-consumer.enabled = true
    datahub-mce-consumer.enabled = true
    or all 3 should be made true?
    e
    • 2
    • 5
  • s

    square-activity-64562

    07/15/2021, 6:17 PM
    How does
    datahub ingest
    command mentioned in https://datahubproject.io/docs/metadata-ingestion find datahub's kafka or rest endpoint? The use case is that I am thinking of running it via jenkins for now. Our jenkins will create a pod in K8s and run it. Jenkins will create a pod in jenkins namespace of our K8s cluster. Datahub is in apps namespace of our K8s cluster. So I am not sure how to configure datahub ingest so that it knows the location of datahub gms and frontend.
    e
    b
    g
    • 4
    • 8
  • g

    gifted-arm-43579

    07/16/2021, 6:55 AM
    hi everyone can i building datahub in windows os ?
    l
    g
    • 3
    • 19
  • a

    ambitious-airline-8020

    07/16/2021, 8:27 AM
    Dear and favorite DataHub team! Could you please advice me, what is the best way to discuss this feature request https://github.com/linkedin/datahub/issues/2871? (Support search for map fields, like customProperties from DatasetProperties)
    m
    • 2
    • 1
  • c

    clean-furniture-99495

    07/16/2021, 9:11 AM
    Hi there! I was wondering if Datahub supports Json-schemas? We would like to provide our Segment Tracking Plan inside Datahub so we can improve the visibility about all our Fronted Events cross the company. If that’s possible, I will proceed on making a PR for a new acryl plugin integration between Segment Protocols and Datahub
    👀 2
    b
    m
    • 3
    • 3
  • s

    square-activity-64562

    07/16/2021, 9:52 AM
    In the quickstart of v0.8.6 (on local machine) I tried to add owner to a dataset which I was able to do. But if I search by owner name "aseem.bansal" there was no search results. But if I go http://localhost:9002/user/urn:li:corpuser:aseem.bansal/ownership the user was present.
    b
    • 2
    • 4
  • s

    square-activity-64562

    07/16/2021, 2:04 PM
    We had a use case which I was hoping to solve after we have a metadata store/data discovery tool up. Wanted to understand if this workflow can be done through datahub. We have different eng teams. Some for different business themes, countries etc. Each of them is managing their own databases (mostly RDS). Data team has access to read replicas of all of them. Any of the teams can change the schema in their own databases. Ideally, we would like the data team to be aware of any schema changes as soon as they happen. With the schema ingestion we will have history of schema changes which can be viewed if we go to individual dataset. Is there a way to have a running history of schema changes in a single place (excluding the first ingestion when we add these assets into datahub)? This can be a good tool for the whole data team to stay up-to-date with schema changes that various teams are doing.
    m
    b
    • 3
    • 4
  • s

    square-activity-64562

    07/16/2021, 2:06 PM
    If I wanted to understand datahub's models and storage what all pages should I read other than this https://datahubproject.io/docs/metadata-modeling/metadata-model. This will help me in deploying/managing it in an easier way. e.g. currently I am stuck with schema registry. I wish to understand where exactly it fits. That will hopefully help me understand where to look for errors and maybe send a PR to fix things if I understand it well enough.
    b
    • 2
    • 2
  • s

    square-activity-64562

    07/16/2021, 6:31 PM
    The browse Path aspect explanation at https://datahubproject.io/docs/metadata-modeling/metadata-model/ can use some improvement in the example. Something that is being used in the UI
    b
    • 2
    • 1
  • c

    curved-magazine-23582

    07/18/2021, 11:35 PM
    Hello team, after upgrading to latest docker images, I ingested some PowerBI objects through GMS API, but browsing no longer works from UI. Ingestion is successful, as I can reach these objects by search. Think I've tried ingestion with BrowsePath and without BrowsePath. I don't see errors related to browsing in logs of UI, GMS and elasticsearch. where should I go next to figure this out. 🤔
    Copy code
    GMS logs:
    17:12:07.872 [qtp544724190-3515] INFO  c.l.m.r.entity.EntityResource - GET urn:li:corpuser:datahub
    17:12:07.875 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities/urn%3Ali%3Acorpuser%3Adatahub - get - 200 - 3ms
    17:12:07.882 [I/O dispatcher 1] INFO  c.l.m.k.e.ElasticsearchConnector - Successfully feeded bulk request. Number of events: 1 Took time ms: -1
    17:12:08.359 [qtp544724190-3397] INFO  c.l.m.r.entity.EntityResource - BATCH GET [urn:li:corpuser:datahub]
    17:12:08.363 [pool-9-thread-1] INFO  c.l.metadata.filter.LoggingFilter - GET /entities?ids=List(urn%3Ali%3Acorpuser%3Adatahub) - batchGet - 200 - 4ms
    l
    g
    • 3
    • 61
  • s

    salmon-cricket-21860

    07/19/2021, 3:44 AM
    Hi, can I customize the topic name
    DataHubUsageEvent_v1
    ? Was able to modified other topic names but failed to change it even w/
    DATAHUB_USAGE_EVENT_NAME
    env variable. Seems ``DataHubUsageEvent_v1` is automatically created when user activities occur.
    Copy code
    DataHubUsageEvent_v1
    catalog-datahub-fmce
    catalog-datahub-mae
    catalog-datahub-mce
    catalog-datahub-usage # created by kafka-setup w/ `DATAHUB_USAGE_EVENT_NAME` ENV
    __consumer_offsets
    _schemas
    e
    • 2
    • 6
  • s

    square-activity-64562

    07/21/2021, 7:16 PM
    What timezone are the dates displayed in datahub?
    l
    g
    • 3
    • 4
  • s

    some-microphone-33485

    07/21/2021, 7:17 PM
    Hello , question regarding password change , How to change default password for user "datahub" ? Thank you .
    g
    b
    r
    • 4
    • 7
1...91011...80Latest