• c

    curved-crayon-1929

    1 year ago
    Hi I am new to datahub after cloning https://github.com/linkedin/datahub/blob/master/docs/quickstart.md when i run
    ./docker/quickstart.sh
    it got stuck as below and keep repating the same can someone help me
    c
    m
    +3
    12 replies
    Copy to Clipboard
  • n

    nutritious-bird-77396

    1 year ago
    We are looking at a Use-case where data-profiling information such as count of events, max, min etc… are pushed every few mins for every dataset in the org. Has linkedin dealt with such a use-case? What special considerations need to be taken care in the architecture?For ex: Data profiling info for 30,000 datasets pushed every 5 mins….
    n
    l
    +2
    12 replies
    Copy to Clipboard
  • h

    high-hospital-85984

    1 year ago
    @clean-bear-94984 (or someone else): there has been some work on adding support for DataJobs and DataTasks: https://github.com/linkedin/datahub/pull/2008 but it seems like the feature is not fully implemented yet. Any plans on doing so? If not, mind if we pick up the work?
    h
    b
    +2
    12 replies
    Copy to Clipboard
  • b

    billions-scientist-31934

    1 year ago
    Hi All. I've been spending some time digging into datahub's backend and I had a quick question I noticed that the MAE's have an internal java representation that can be serialized into Avro, but no part of them seem to get put into any formal query intermediate representation (calcite for example). I thought that pegasus was this, but it looks like pegasus is just an object format to help decorate the rest layer. Does this meant that datahub is mean to be strictly only a federated metadata discovery tool, unlike a tool like Dremio which meant to be more like a federated Query or Execution engine? If so (apologies in advance if I overlooked something), is the long term plan to collide with the coral / dali community to start to get the execution side? Since coral only supports hive view definitions what is the interim plan to get things like pushdown optimization into queries before it supports more of the backends that datahub currently supports? Is datahub meant to avoid approaching query execution altogether only focus on metadata query?
    b
    m
    2 replies
    Copy to Clipboard
  • m

    mammoth-bear-12532

    1 year ago
    <!here> News Alert! We've just published the project roadmap for the first half of 2021. Check it out here! https://datahubproject.io/docs/roadmap/
    m
    b
    +1
    5 replies
    Copy to Clipboard
  • i

    incalculable-ocean-74010

    1 year ago
    Hello, does datahub provide operational metric endpoints like jmx metrics for Prometheus? Is there documentation on this?
    i
    w
    +2
    8 replies
    Copy to Clipboard
  • s

    some-crayon-90964

    1 year ago
    Hey guys, I am reading this document, so I have a question. What is the difference between Entity and Snapshot, conceptually and technically? @fancy-advantage-41244 fyi
    s
    b
    +2
    5 replies
    Copy to Clipboard
  • m

    mammoth-bear-12532

    1 year ago
    Some good news after all those build failures 🙂 • SSO using OIDC is now in
    master
    ! 🎉 • Please take it for a spin and let @big-carpet-38439 know if you run into any issues. • We've tested it with Google SSO and Okta. • Docs here: https://datahubproject.io/docs/how/configure-oidc-react
    m
    1 replies
    Copy to Clipboard
  • g

    gentle-exabyte-43102

    1 year ago
    DatasetUrn's look to be of the form
    urn:li:dataset:(urn:li:dataPlatform:{platform},{dataset_name},PROD)
    where platform seems to be an enum, something like hive, hdfs, kafka, mysql, etc. is it possible to specify other values for
    platform?
    can i supply whatever value i want? it seems like i can't, i'm getting pegasus errors
    g
    g
    +1
    5 replies
    Copy to Clipboard
  • i

    incalculable-ocean-74010

    1 year ago
    Hello, is there a particular reason why docker images are created directly from datahub's source instead of relying on published artifacts? I.e: published jars for GMS? published packages for python? Right now, if I need to modify a particular image I need to have the entire codebase locally available to perform relatively minor changes.
    i
    m
    +1
    17 replies
    Copy to Clipboard