https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • f

    faint-hair-91313

    05/26/2021, 8:47 AM
    Hi guys, I've seen that you have in your Q2 roadmap to improve the way new entities and relationships are added. Any idea when it's coming? Better put, do I need to spend some effort to do it now, or could I wait few weeks? Thanks a lot!
    l
    • 2
    • 2
  • s

    stale-jewelry-2440

    05/26/2021, 12:35 PM
    Is it possible to use other DBs other than MySql for internal use of DataHub? I already have a msql server in prodution, it will be maybe usful to use it withiut spawning a new DB
    h
    l
    • 3
    • 2
  • a

    acceptable-architect-70237

    05/26/2021, 3:55 PM
    Hi team, does the dataset have the version support? I know the MySQL stores the different version of aspects?
    l
    g
    b
    • 4
    • 8
  • m

    mammoth-bear-12532

    05/27/2021, 4:30 AM
    <!here> Reminder this is happening in < 12 hrs!
    🙌 2
    a
    • 2
    • 2
  • a

    adventurous-air-87342

    05/27/2021, 5:16 PM
    overlyaing KV store on top of a relational store may lead to performance issues. I don’t know if this is true of course..
    m
    b
    m
    • 4
    • 8
  • b

    big-carpet-38439

    05/27/2021, 6:02 PM
    Thanks to all who attended the Townhall! Here are the slides from the NoCode talk: https://docs.google.com/presentation/d/1_F2xy2agb7M4IFhAKy_ORte0w-7hBRfTCypxwWhr2kg/edit?usp=sharing
    🙌 4
    🚀 2
    c
    b
    • 3
    • 4
  • p

    powerful-telephone-71997

    05/31/2021, 4:59 AM
    Hi All, https://datahubproject.io/docs/quickstart -> I am following this document to install datahub with docker, but I am hitting docker-compose version issues on AWS linux, whats the recommended version of docker compose? Also, the step 3 says, clone “this” repo, without a hyperlink, that needs to be fixed as well…thank you for any help…
    m
    • 2
    • 2
  • p

    powerful-telephone-71997

    05/31/2021, 5:00 AM
    <!here>
    • 1
    • 1
  • b

    better-orange-49102

    05/31/2021, 7:14 AM
    Just a suggestion, could we tag docker image IDs to datahub releases instead of relying on ${DATAHUB_VERSION:-latest}?
    l
    e
    b
    • 4
    • 8
  • h

    handsome-airplane-62628

    06/02/2021, 9:02 PM
    Is there a way to export data that was manually mapped/adjusted in datahub if it's necessary to tear-down a datahub instance and rebuild? IE if someone manually updated column descriptions or added tags and we needed to export this data somehow to re-load (assuming we need to teardown/delete volume and rebuild datahub)
    l
    • 2
    • 1
  • i

    icy-holiday-55016

    06/03/2021, 8:49 AM
    @loud-island-88694 @mammoth-bear-12532 it was good to meet with you both yesterday, i found it very informative among several things, you mentioned you have a Datahub SaaS offering. couple of questions: • would we be able to add our own entities to it? (via code or no-code) • would we be able to extend the system? (eg if we wanted to make a change to the search functionality)
    l
    • 2
    • 1
  • c

    crooked-leather-44416

    06/03/2021, 3:24 PM
    Is there a link that describes how to install the datahub CLI tool mentioned here: https://datahubproject.io/docs/debugging/#how-can-i-confirm-if-all-docker-containers-are-running-as-expected-after-a-quickstart?
    g
    • 2
    • 4
  • c

    crooked-toddler-8683

    06/03/2021, 6:54 PM
    Good afternoon friends. I am currently researching the data catalog and metadata tools for my company. I can see that a lot of subscription-based (paid) software offer AI-driven data discovery techniques and other "fancy" and advanced features. Our goal is, however, way more simple: We need a tool that would let us give meaningful descriptions to the databases and tables, assign owners for the specific DBs/tables, have an intuitively looking list of our databases with basic metrics such as • min value • max value • null count • outliers • other data quality metrics we may come up with for each table. Would that be possible to achieve with DataHub? How much do you think it would take in terms of time to deploy and configure the app? The reason I am asking such a basic question is the fact that I've already tried the demo and it doesn't seem to show every single thing I have on my requirement list above. Also, I've tried Amundsen, but that's such a fresh product - there is no way it's even close to satisfying at least a part of what we need.
    m
    • 2
    • 4
  • h

    high-hospital-85984

    06/04/2021, 8:17 PM
    This has probably been discussed before, but I just ran into this issue again and thought I’d ask about it. the issue is that, at least to my understanding, that the GMS doesn’t trigger a MAE if the incoming MCE doesn’t bring about an update in the aspects in the GMS database. We had some intermittent kafka problems (updated the cluster) and some aspects got registered in the GMS, but the MAE failed due to connection problems. I had do manually go in and delete the relevant rows in the database and then re-ingestion the data in order to get the MAE to go through. Is it currently possible to tell the GMS to retrigger the MAEs? If not, would it be possible to write a short script to do that?
    m
    b
    • 3
    • 2
  • b

    best-balloon-56

    06/06/2021, 2:03 PM
    Hey, Is it possible to deploy Datahub without Kafka, using only the push Api model?
    h
    l
    l
    • 4
    • 3
  • r

    rich-policeman-92383

    06/07/2021, 11:17 AM
    Hello Can we use oracle19c as the backend for datahub.
    g
    • 2
    • 3
  • i

    icy-holiday-55016

    06/08/2021, 8:41 AM
    Hey folks, I'm very interested to see what the integration with DQ systems such as Great Expectations look like. Are you still expecting to be able to release it before the end of June?
    l
    • 2
    • 4
  • s

    sticky-television-18623

    06/09/2021, 11:06 PM
    I am trying to run the MAE on K8s and it fails when it tries to check if there is a dataplatformindex_v2 on elasticSearch. I am using basic auth to connect to ES. I can see the initial connection uses the basic auth and returns a 200 but then appears to not use it when trying to verify the index. Is there a setting I should check?
    e
    • 2
    • 39
  • b

    better-orange-49102

    06/10/2021, 2:06 AM
    Do we expect RBAC to be introduced soon? my colleagues are pondering if we should look at hacking the codebase or we should hold off and wait a while more.
    b
    • 2
    • 9
  • c

    chilly-house-99102

    06/10/2021, 2:03 PM
    Any reference step by step guide to install data hub on AWS
    b
    g
    b
    • 4
    • 8
  • c

    clean-cpu-43303

    06/11/2021, 1:55 PM
    Happy Friday! I am curious how others are using the tag feature for tables and fields. My company has yet rolled out Datahub, but through previous experience of using tags (eg Jira) we thought that without proper governance the tagging feature could get messy very soon. 🙏 cc @silly-dusk-92062
    l
    b
    g
    • 4
    • 4
  • a

    alert-balloon-36489

    06/12/2021, 7:20 PM
    Hello everyone. Not sure if it's just me, but I attempted to download the repo and get this error -
    Îť git clone <https://github.com/linkedin/datahub.git>
    Cloning into 'datahub'...
    remote: Enumerating objects: 85868, done.
    remote: Counting objects: 100% (8205/8205), done.
    remote: Compressing objects: 100% (1696/1696), done.
    remote: Total 85868 (delta 3797), reused 7725 (delta 3577), pack-reused 77663
    Receiving objects: 100% (85868/85868), 77.88 MiB | 3.23 MiB/s, done.
    Resolving deltas: 100% (44752/44752), done.
    error: invalid path 'metadata-models/src/main/pegasus/com/linkedin/metadata/aspect/ServiceAspect.pdl '
    fatal: unable to checkout working tree
    warning: Clone succeeded, but checkout failed.
    You can inspect what was checked out with 'git status'
    and retry with 'git restore --source=HEAD :/'
    Is anyone else getting the same error? Thanks!
    g
    m
    • 3
    • 6
  • l

    little-france-72098

    06/14/2021, 3:04 PM
    Hello, for our use-case I would have to add two fabrics (test, preprod) to datahub, to have consistent browsing paths/naming. I have found that one has to add these fabrics to an enum in various places and rebuild, or has the new noCode metadata update changed something in this regard?
    g
    b
    • 3
    • 13
  • b

    better-orange-49102

    06/15/2021, 1:51 AM
    would like to check if my understanding is correct, for the docker/dev.sh script, it will pull docker images with the #debug tag for containers specified in docker-compose.dev.yml? I got confused because I went to Docker hub and was unable to find any images when i searched by debug tag. Just did a docker-compose pull and it returned errors for all the debug images
    g
    • 2
    • 5
  • g

    glamorous-microphone-33484

    06/15/2021, 8:59 AM
    Hello, able to share the roadmap for datahub beyond June 2021?
    l
    • 2
    • 1
  • g

    gifted-bird-57147

    06/15/2021, 11:36 AM
    Hi, looks like the quickstart guide has just changed to use the datahub CLI as opposed to the quickstart.sh script? However, if I try to use the datahub CLI I get 'no such command docker'. datahub --version returns acryl-datahub 0.8.1.1 datahub --help shows 'check' 'ingest' and 'version' but indeed no 'docker' command....
    👀 1
    b
    g
    • 3
    • 9
  • s

    some-microphone-33485

    06/17/2021, 4:28 PM
    Hello , I was setting up the datahub with these commands , and after setting up I am getting these following errors .
    Copy code
    python -m pip install --upgrade pip wheel setuptools
    python -m pip uninstall datahub acryl-datahub || true  # sanity check - ok if it fails
    python -m pip install --upgrade acryl-datahub
    When I check the version datahub --version it is sending this error . Anything I am missing here ?
    Copy code
    RuntimeError: no validator found for <class 'datahub.ingestion.source.kafka_connect.KafkaConnectLineage'>, see `arbitrary_types_allowed` in Config
    g
    b
    m
    • 4
    • 8
  • g

    green-football-43791

    06/17/2021, 8:09 PM
    <!here> curious to learn what went into building the Lineage feature in DataHub? Want to see where DataHub's Lineage is headed in the future? Check out the blog post I wrote about DataHub’s Lineage Explorer. Give it a read and share it around to spread the DataHub love! https://medium.com/datahub-project/data-in-context-lineage-explorer-in-datahub-a53a9a476dc4
    👍 6
    👏 2
    ❤️ 7
    i
    • 2
    • 1
  • e

    early-hydrogen-59749

    06/21/2021, 12:20 PM
    @green-football-43791 During Metadata ingestion we have observed duplicate column values for same urn and aspect ( except for version and createdOn date )for many of the records in aspect table. While analyzing the reason behind we got to know the source of ingestion was responsible for such entries. Can we have some kind of configurable check on ldh side to check for such duplicity before ingesting the data to mysql.
    g
    • 2
    • 8
  • c

    cuddly-lunch-28022

    06/21/2021, 12:38 PM
    Hello! Could you tell me please how to debag avro.schema.AvroException: ('Datum union type not in schema: %s', 'com.linkedin.pegasus2avro.dataset.DownstreamLineage')
    g
    • 2
    • 6
1...789...80Latest