https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • w

    wonderful-quill-11255

    02/25/2021, 8:07 AM
    Hello. I'm looking around a bit at how to monitor the different components. The mce and mae are using springs actuator library but for the gms and frontend I'm not seeing anything similar. Do they have something similar that I'm just not seeing? If not, would it make sense to try and use the same library for them as well?
    b
    • 2
    • 5
  • i

    incalculable-ocean-74010

    02/26/2021, 5:19 PM
    Hello, does datahub support entity versioning & history. As in, suppose a given dataset that defines a SQL table changes over time (columns added/removed/updated), can a user see the metadata at a given point in time?
    g
    • 2
    • 7
  • g

    gentle-exabyte-43102

    02/26/2021, 8:30 PM
    Hello there! Anyone seen this before?
    Copy code
    $ ./docker/quickstart.sh
    Pulling elasticsearch        ... done
    Pulling mysql                ... done
    Pulling elasticsearch-setup  ... done
    Pulling kibana               ... done
    Pulling neo4j                ... done
    Pulling zookeeper            ... done
    Pulling broker               ... done
    Pulling schema-registry      ... done
    Pulling schema-registry-ui   ... done
    Pulling kafka-setup          ... done
    Pulling datahub-mae-consumer ... done
    Pulling kafka-rest-proxy     ... done
    Pulling kafka-topics-ui      ... done
    Pulling datahub-gms          ... done
    Pulling datahub-mce-consumer ... done
    Pulling datahub-frontend     ... done
    Building elasticsearch-setup
    Sending build context to Docker daemon  27.78MB
    
    Step 1/10 : ARG APP_ENV=prod
    Step 2/10 : FROM jwilder/dockerize:0.6.1 AS base
     ---> 849596ab86ff
    Step 3/10 : RUN apk add --no-cache curl jq
     ---> Running in d6b9d0968be4
    fetch <http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz>
    WARNING: Ignoring <http://dl-cdn.alpinelinux.org/alpine/v3.6/main/x86_64/APKINDEX.tar.gz>: temporary error (try again later)
    fetch <http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz>
    WARNING: Ignoring <http://dl-cdn.alpinelinux.org/alpine/v3.6/community/x86_64/APKINDEX.tar.gz>: temporary error (try again later)
    ERROR: unsatisfiable constraints:
      curl (missing):
        required by: world[curl]
      jq (missing):
        required by: world[jq]
    The command '/bin/sh -c apk add --no-cache curl jq' returned a non-zero code: 2
    m
    e
    • 3
    • 29
  • c

    curved-magazine-23582

    03/01/2021, 2:56 AM
    will UserGroup be used as mechanism for dataset access control? Or is there such a thing in the roadmap for DataHub?
    l
    • 2
    • 2
  • b

    big-carpet-38439

    03/01/2021, 7:03 PM
    PSA: This Wednesday we will be hosting the inaugural React Office Hours sessions! 🎉 Feel free to stop in to ask questions or just hack on the app with @green-football-43791 and I! We will be conducting 2 sessions: • Morning: 8-10am PST • Afternoon: 3-5pm PST Both will be hosted at https://meet.google.com/rbr-vbsy-yuy?authuser=1 .
    m
    • 2
    • 4
  • m

    mammoth-bear-12532

    03/02/2021, 3:38 AM
    We are aware that datahub's build is broken right now due to a linkedin jfrog artifactory issue. @microscopic-receptionist-23548 is looking into it.
    w
    s
    • 3
    • 2
  • a

    acceptable-architect-70237

    03/02/2021, 4:39 PM
    hello, team, a general question about
    data replay strategy
    . for example, in our case, we need to calculate the dataset's data quality. The data quality is calculated based on the aspects of a dataset. Since all datasets are already in datastore (MySQL, Neo4j and Elastic Search), we need to one way to pull data and do the calculation. Right now we are pulling data from MySQL using Python script. Do you guys have some suggestions?
    m
    • 2
    • 5
  • i

    incalculable-ocean-74010

    03/02/2021, 5:29 PM
    Hello, does datahub support deleting concrete entities? From https://github.com/linkedin/datahub/tree/master/gms I see get/search/update & list but no delete.
    b
    m
    s
    • 4
    • 25
  • n

    nutritious-bird-77396

    03/03/2021, 12:04 AM
    As i am working on getting the PR out for GraphQL MLModel Query.. I am facing an issue in the
    MLModels
    Client where the Snapshot aspects array is empty in here - https://github.com/linkedin/datahub/blob/master/gms/impl/src/main/java/com/linkedin/metadata/resources/ml/MLModels.java#L121 Any clues on where the issue might be?
    b
    • 2
    • 11
  • g

    gentle-exabyte-43102

    03/04/2021, 12:11 AM
    fresh install of
    datahub
    , browsing to
    /browse/datasets
    i see "An error occurred. Please try again shortly." and in the console a request to
    api/v2/browse?type=dataset&count=100&start=0
    is a 400 with "Bad Request. type parameter can not be null"
    g
    b
    • 3
    • 69
  • c

    curved-crayon-1929

    03/04/2021, 7:28 AM
    Hi I am new to datahub after cloning https://github.com/linkedin/datahub/blob/master/docs/quickstart.md when i run
    Copy code
    ./docker/quickstart.sh
    it got stuck as below and keep repating the same can someone help me
    m
    e
    +2
    • 5
    • 12
  • n

    nutritious-bird-77396

    03/04/2021, 10:15 PM
    We are looking at a Use-case where data-profiling information such as count of events, max, min etc… are pushed every few mins for every dataset in the org. Has linkedin dealt with such a use-case? What special considerations need to be taken care in the architecture? For ex: Data profiling info for 30,000 datasets pushed every 5 mins….
    l
    m
    b
    • 4
    • 12
  • h

    high-hospital-85984

    03/05/2021, 11:22 AM
    @clean-bear-94984 (or someone else): there has been some work on adding support for DataJobs and DataTasks: https://github.com/linkedin/datahub/pull/2008 but it seems like the feature is not fully implemented yet. Any plans on doing so? If not, mind if we pick up the work?
    b
    l
    m
    • 4
    • 12
  • b

    billions-scientist-31934

    03/06/2021, 1:26 PM
    Hi All. I've been spending some time digging into datahub's backend and I had a quick question I noticed that the MAE's have an internal java representation that can be serialized into Avro, but no part of them seem to get put into any formal query intermediate representation (calcite for example). I thought that pegasus was this, but it looks like pegasus is just an object format to help decorate the rest layer. Does this meant that datahub is mean to be strictly only a federated metadata discovery tool, unlike a tool like Dremio which meant to be more like a federated Query or Execution engine? If so (apologies in advance if I overlooked something), is the long term plan to collide with the coral / dali community to start to get the execution side? Since coral only supports hive view definitions what is the interim plan to get things like pushdown optimization into queries before it supports more of the backends that datahub currently supports? Is datahub meant to avoid approaching query execution altogether only focus on metadata query?
    m
    • 2
    • 2
  • m

    mammoth-bear-12532

    03/09/2021, 5:00 PM
    <!here> News Alert! We've just published the project roadmap for the first half of 2021. Check it out here! https://datahubproject.io/docs/roadmap/
    👍 8
    🥳 1
    🙌 1
    b
    i
    f
    • 4
    • 5
  • i

    incalculable-ocean-74010

    03/10/2021, 9:17 PM
    Hello, does datahub provide operational metric endpoints like jmx metrics for Prometheus? Is there documentation on this?
    w
    b
    s
    • 4
    • 8
  • s

    some-crayon-90964

    03/11/2021, 5:48 PM
    Hey guys, I am reading this document, so I have a question. What is the difference between Entity and Snapshot, conceptually and technically? @fancy-advantage-41244 fyi
    b
    e
    m
    • 4
    • 5
  • m

    mammoth-bear-12532

    03/12/2021, 4:59 AM
    Some good news after all those build failures 🙂 • SSO using OIDC is now in
    master
    ! 🎉 • Please take it for a spin and let @big-carpet-38439 know if you run into any issues. • We've tested it with Google SSO and Okta. • Docs here: https://datahubproject.io/docs/how/configure-oidc-react
    🎉 2
    🙌 1
    s
    • 2
    • 1
  • g

    gentle-exabyte-43102

    03/12/2021, 7:49 PM
    DatasetUrn's look to be of the form
    urn:li:dataset:(urn:li:dataPlatform:{platform},{dataset_name},PROD)
    where platform seems to be an enum, something like hive, hdfs, kafka, mysql, etc. is it possible to specify other values for
    platform?
    can i supply whatever value i want? it seems like i can't, i'm getting pegasus errors
    g
    b
    • 3
    • 5
  • i

    incalculable-ocean-74010

    03/15/2021, 4:40 PM
    Hello, is there a particular reason why docker images are created directly from datahub's source instead of relying on published artifacts? I.e: published jars for GMS? published packages for python? Right now, if I need to modify a particular image I need to have the entire codebase locally available to perform relatively minor changes.
    m
    m
    • 3
    • 17
  • a

    astonishing-yak-92682

    03/15/2021, 4:46 PM
    Getting this error while trying to login in datahub react application using quickstart-react script
    m
    g
    • 3
    • 5
  • c

    curved-magazine-23582

    03/17/2021, 4:17 AM
    is possible to add aws S3 to list of dataPlatforms? most of our datasets are in AWS S3 lake.
    b
    c
    +2
    • 5
    • 24
  • w

    worried-flower-88750

    03/17/2021, 10:24 PM
    Hello everyone 👋 Is there a way to edit descriptions through the UI? Just curious
    g
    • 2
    • 2
  • m

    mammoth-bear-12532

    03/19/2021, 2:32 AM
    Folks: an important announcement: We are officially on Elasticsearch-7 now! 🚀 Thanks to everyone who worked hard for this milestone: @microscopic-waitress-95820, @microscopic-receptionist-23548 and a cameo by @early-lamp-41924. There is a migration guide if you need it here: https://datahubproject.io/docs/advanced/es-7-upgrade. Happy searching!
    h
    e
    • 3
    • 3
  • a

    acoustic-printer-83045

    03/21/2021, 10:29 PM
    👋 Just wondering if anyone else is experiencing an elasticsearch failure when running
    ./docker/quickstart.sh
    When I try to fire up elasticsearch I get this (snipped) log:
    Copy code
    elasticsearch             | {"type": "server", "timestamp": "2021-03-21T22:25:21,301Z", "level": "ERROR", "component": "o.e.b.ElasticsearchUncaughtExceptionHandler", "cluster.name": "docker-cluster", "node.name": "elasticsearch", "message": "uncaught exception in thread [main]", 
    elasticsearch             | "stacktrace": ["org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
    elasticsearch             | "at org.elasticsearch.bootstrap.Elasticsearch.init(Elasticsearch.java:174) ~[elasticsearch-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.bootstrap.Elasticsearch.execute(Elasticsearch.java:161) ~[elasticsearch-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:86) ~[elasticsearch-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.cli.Command.mainWithoutErrorHandling(Command.java:127) ~[elasticsearch-cli-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.cli.Command.main(Command.java:90) ~[elasticsearch-cli-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:126) ~[elasticsearch-7.9.3.jar:7.9.3]",
    elasticsearch             | "at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:92) ~[elasticsearch-7.9.3.jar:7.9.3]",
    elasticsearch             | "Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/usr/share/elasticsearch/data]] with lock id [0]; maybe these locations are not writable or multiple nodes were started without increasing [node.max_local_storage_nodes] (was [1])?",
    I don't think this is caused by resource contention but I could be wrong. Thanks!
    m
    • 2
    • 3
  • h

    high-hospital-85984

    03/22/2021, 10:58 AM
    We just tried to update from 0.6.1 to 0.7.0 and suddenly the MCE isn’t consuming message anymore. No errors in the log, no configs has been changed. Any ideas as to what could be the issue?
    i
    l
    g
    • 4
    • 26
  • i

    incalculable-ocean-74010

    03/23/2021, 2:29 PM
    Hello everyone. With the introduction of Elasticsearch 7 we no longer need to define mappings.json files right? The docker image for elastic setup, still uses the mappings.files, is this now legacy?
    b
    e
    • 3
    • 2
  • s

    some-crayon-90964

    03/23/2021, 2:39 PM
    image.png
    i
    l
    • 3
    • 6
  • s

    some-crayon-90964

    03/23/2021, 2:39 PM
    At this point, We don't know what to do in order to fix this, please advise
    👀 1
    m
    • 2
    • 3
  • m

    mammoth-bear-12532

    03/23/2021, 5:38 PM
    Hi folks, just wanted to let you know that we merged in the
    dbt
    source last night. Thanks to great work by @acoustic-printer-83045! Please give it a spin in your dbt environment and let us know how it works for you! (https://datahubproject.io/docs/metadata-ingestion#dbt-dbt)
    👍 1
    🙌 4
    a
    • 2
    • 1
1...567...80Latest