https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • g

    glamorous-kite-95510

    06/21/2021, 1:39 PM
    Hi, Our company is very interested in integrating DataHub in our system. Is there anything required ? What is the sequentially steps in deploying DataHub ? How can we backup metadata ? If we can deploy by docker, what is the persistent part to backup metadata when docker restarts ? Can you share the experience in deploying ? We need a doccument how to do that and we really need your support. It would be nice , if we could contact in detail. Here is my email address: phamminhsyhcmus@gmail.com
    b
    e
    • 3
    • 27
  • m

    mammoth-bear-12532

    06/21/2021, 3:36 PM
    <!here> 📣 DataHub now supports discovering highly popular tables and requires less resources to run! Join our June Townhall this Friday to hear more 🙂 • When: June 25th at 9am US PT 🕘 • Signup to get a calendar invite: here • Townhall Zoom: https://zoom.datahubproject.io Agenda: Project Updates by Shirshanka - Release notes - RBAC update - Roadmap for H2 2021 Demo: Table Popularity powered by Query Activity by @gray-shoe-75895 Case Study: Business Glossary in production at Saxo Bank by @cuddly-spoon-1635 @silly-london-3535 Developer Session: Simplified Deployment for DataHub by @big-carpet-38439 @green-football-43791
    👍🏼 1
    👍 13
    s
    • 2
    • 2
  • f

    flaky-king-99571

    06/23/2021, 5:20 AM
    Hey guys, I came to DataHub after trying to install Amundsen for a month. I hope it would be easy to install and configure this than what I endured in the last couple of months.
    👋 1
    🤣 1
    m
    b
    • 3
    • 5
  • m

    mammoth-bear-12532

    06/23/2021, 6:47 AM
    <!here> We just published the May edition for our community newsletter yesterday. Thanks for all the contributions and the great momentum. Please share with your network and clap away 🙂 https://medium.com/datahub-project/linkedin-datahub-project-updates-ed98cdf913c1?source=friends_link&amp;sk=9930ec5579299b155ea87c747683d1ad
    👏 3
    🙂 1
    🎉 9
    h
    • 2
    • 1
  • d

    delightful-policeman-14573

    06/23/2021, 5:37 PM
    Hi all! Just wanted to say hi 👋, introduce myself and ask a few questions! Matthias from Arabesque AI here, we are looking into metadata stores and landed on Datahub and it's been very enjoyable so far 🎉. A few initial questions (please point me to documentation where possible, my initial searches might have been failing / too superficial to find it myself, but I am happy to go ahead and dig deeper myself before burdening more people 🙂) : • Is there a way to ingest metadata from Google Cloud Storage or ElasticSearch? • I read somewhere about the possibility to add ML models as well - is this documented somewhere on how to do so? • Lastly: data lineage - I've found docs on how to do so using Airflow, but is there a way to add this manually (for now) and how? We set it up internally on GCP, so I'd be happy to look into contributing docs/steps if that'd be useful!
    🙌 1
    m
    b
    b
    • 4
    • 7
  • n

    nutritious-bird-77396

    06/23/2021, 7:41 PM
    Hi folks….I am looking to extend the GMS Client to get Datasets by specific version in addition to the Urn i.e.
    Dataset get(Dataset urn, int version)
    (https://github.com/linkedin/datahub/blob/master/gms/client/src/main/java/com/linkedin/dataset/client/Datasets.java#L62) Is this achievable? Could you guys highlight the high level changes that might be needed to provide this ability….
    g
    • 2
    • 25
  • g

    glamorous-kite-95510

    06/25/2021, 1:13 PM
    I got this error when i tried to install elasticsearch by kubernetes . I setup my server with configuration of 8 GB RAM. Could it be out of memory. How can i fix it ? I am using your configuration on minikube
    m
    e
    g
    • 4
    • 16
  • m

    mammoth-bear-12532

    06/25/2021, 5:07 PM
    Thanks for the cool presentations today! Folks if you have questions about the protobuf schema integration with business glossary, ping @silly-london-3535 and @cuddly-spoon-1635
    👍 1
    • 1
    • 1
  • m

    mammoth-bear-12532

    06/25/2021, 5:10 PM
    I'll publish the roadmap for H2 2021 soon! please ping me for additions / collab opportunities 🙏
    🙌 3
    f
    • 2
    • 3
  • a

    acceptable-architect-70237

    06/25/2021, 5:12 PM
    Hi Datahub team, what's the status of [no code modeling](https://datahubproject.io/docs/advanced/no-code-modeling). I looked at the code, the sample
    service
    entity is there. after I build & run, I tried the sample post request
    Copy code
    url '<http://localhost:8080/entities?action=ingest>' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data '{
       "entity":{ 
          "value":{
             "com.linkedin.metadata.snapshot.ServiceSnapshot":{
                "urn": "urn:li:service:mydemoservice",
                "aspects":[
                   {
                      "com.linkedin.service.ServiceInfo":{
                         "description":"My demo service",
                         "owner": "urn:li:corpuser:user1"                     
                      }
                   },
                   {
                      "com.linkedin.common.BrowsePaths":{
                         "paths":[
                            "/my/custom/browse/path1",
                            "/my/custom/browse/path2"
                         ]
                      }
                   }
                ]
             }
          }
       }
    }'
    m
    b
    • 3
    • 12
  • b

    big-carpet-38439

    06/25/2021, 5:33 PM
    It is purely illustrative (of course if you need this model you can add it)
    a
    • 2
    • 9
  • f

    flaky-agent-21930

    06/25/2021, 6:25 PM
    Hey everyone, I just joined the townhall and I am extremely impressed by just about everything. I went straight ahead and tried to quickstart it but it seems to fail and I am not quite sure what the fix could be. I receive this error and I attached the log as requested. Any sort of help would be amazing.
    tmppdl6anqx.log
    b
    e
    • 3
    • 19
  • c

    clean-furniture-99495

    06/27/2021, 8:53 PM
    Hi there, I was wondering if APIs could be considered as a Data Source so they could be included in the Lineage of the Data Sets? It could be possible to extract all the metadata from the
    Swagger
    documentation 🤔
    m
    m
    • 3
    • 3
  • p

    powerful-telephone-71997

    06/28/2021, 5:25 AM
    Any way to ingest data from Tableau, Metabase and Redash and bring in the lineage? I would love to collaborate, but will need pointers to start with…
    m
    • 2
    • 2
  • m

    mammoth-bear-12532

    06/28/2021, 6:35 AM
    <!here> Release 0.8.4 (as seen in Friday's town-hall) is now available. (https://github.com/linkedin/datahub/releases/tag/v0.8.4) Release Highlights 🎉 • Dataset Popularity, Recent Queries powered by Usage logs (support for Snowflake, BigQuery) • Markdown descriptions and editing • New Integrations : Glue Jobs, Feast • Versioned API for metadata GETs • No neo4j requirement, Elastic for Graph • Docker image hardening • Improved logging • GCP Deployment Guide
    👍 6
    l
    e
    • 3
    • 2
  • l

    lively-judge-30357

    06/28/2021, 6:24 PM
    curious: why are you interested in moving off of Neo4j?
    h
    b
    +2
    • 5
    • 12
  • m

    mammoth-bear-12532

    06/28/2021, 7:52 PM
    <!here>: If you missed the Friday townhall, or just want to share it with your colleagues and friends, the full version of the townhall is now up!

    https://www.youtube.com/watch?v=xUHOdDfdFpY▾

    🙏 2
    👍 3
    • 1
    • 1
  • s

    square-activity-64562

    06/30/2021, 8:55 AM
    I am doing
    datahub docker quickstart
    and it is pulling a bunch of things
    Copy code
    Pulling elasticsearch          ... downloading (68.7%)
    Pulling elasticsearch-setup    ... done
    Pulling mysql                  ... pull complete
    Pulling datahub-gms            ... pull complete
    Pulling datahub-frontend-react ... done
    Pulling mysql-setup            ... 
    Pulling zookeeper              ... waiting
    Pulling broker                 ... 
    Pulling schema-registry        ... downloading (49.4%)
    Pulling kafka-setup            ...
    Are all of these hard dependencies of datahub?
    h
    b
    +2
    • 5
    • 11
  • s

    square-activity-64562

    06/30/2021, 12:22 PM
    How does datahub handle schema changes?
    b
    • 2
    • 3
  • c

    curved-sandwich-81699

    06/30/2021, 6:21 PM
    Hi all, I am getting those errors when trying to run DataHub v0.8.4 from quickstart.sh:
    Copy code
    datahub-gms               | 2021/06/30 18:18:42 Problem with dial: dial tcp: lookup mysql on 127.0.0.11:53: no such host. Sleeping 1s
    mysql-setup               | 2021/06/30 18:18:43 Problem with dial: dial tcp: lookup mysql on 127.0.0.11:53: no such host. Sleeping 1s
    l
    b
    m
    • 4
    • 17
  • f

    fresh-fish-73471

    07/01/2021, 3:46 PM
    We are trying to enable the OIDC (AWS cognito) authentication for dockerized Datahub We followed the instructions given in the below link https://datahubproject.io/docs/how/configure-oidc-react/ We have configured following properties from datahub/docker/datahub-frontend/env/docker.env Required OIDC configs AUTH_OIDC_ENABLED=true AUTH_OIDC_CLIENT_ID=XXXXXXXX AUTH_OIDC_CLIENT_SECRET= #AUTH_OIDC_DISCOVERY_URI=https://xxxxxxxxxxxxxxxxxxxxxxxx/.well-known/openid-configuration AUTH_OIDC_DISCOVERY_URI=https://xxxxxxxxxxxx.xxxxxxxxxxxxxxxx/openid-configuration AUTH_OIDC_BASE_URL=https://XXXXXXXXX.com Optional OIDC configs AUTH_OIDC_USER_NAME_CLAIM=email AUTH_OIDC_USER_NAME_CLAIM_REGEX=([^@]+) AUTH_OIDC_SCOPE=openid Uncomment to disable JAAS username / password authentication (enabled by defau lt) AUTH_JAAS_ENABLED=false But for some reasons redirection is not happening, instead it is taking us to default datahub login page when we hit the base url. Please let us know, if we are missing something Thanks in advance.
    b
    m
    c
    • 4
    • 16
  • b

    better-orange-49102

    07/05/2021, 2:00 AM
    i see that the roadmap for Jul to Nov has been updated. would having the UI show historic version of schemas be part of the work to be expected as well?
    m
    • 2
    • 2
  • a

    ambitious-airline-8020

    07/05/2021, 9:05 AM
    Hi All. Could you point me to a correct direction? I want to write Java application, which could use DataHub classes (DAO, EntityClient, AspectClient) to work with DataHub data. So the questions about that: 1. Is there any example for such application? 2. How do we setup maven dependencies correctly (as only GMA is available in artifactory) for that app? 3. How do we setup required connections for that app (to primary SQL DB and secondary ES) in DataHub way correctly? Thanks!
    m
    • 2
    • 4
  • b

    brief-lizard-77958

    07/05/2021, 1:03 PM
    I have an issue with the quickstart. I'm trying to start datahub on windows server 2016. I have docker fully running and docker-compose installed. Upon running "python -m datahub docker quickstart" I get several errors saying "ERROR: for kafka-setup image operating system "linux" cannot be used on this platform". Here is the full log of what I get upon running quickstart: https://pastebin.com/ffMVtVet. Trying to run with "docker-compose -p datahub up", I get the following error: "ERROR: no matching manifest for windows/amd64 10.0.14393 in the manifest list entries"
    g
    • 2
    • 4
  • s

    square-activity-64562

    07/06/2021, 7:41 AM
    If I am reading this correctly https://datahubproject.io/docs/architecture/metadata-ingestion/ it is possible for me to remove the kafka dependency in ingestion if we decide to simply use the http push mechanism. We are not using kafka at the moment so don't want to introduce that currently, if possible.
    a
    • 2
    • 1
  • s

    square-activity-64562

    07/06/2021, 7:41 AM
    Are the docs at https://datahubproject.io/docs/architecture/metadata-serving/ up-to-date? I read in recent changelogs that Neo4j is not longer required. I was wondering if kafka stream here is required or not.
    m
    g
    • 3
    • 2
  • s

    square-activity-64562

    07/06/2021, 7:43 AM
    This page https://datahubproject.io/docs/what/mxe/#metadata-audit-event-mae can be reached via linksc present in https://datahubproject.io/docs/architecture/metadata-serving/ but is not present on the sidebar when looking at docs page. Is this intentional?
    g
    • 2
    • 1
  • d

    damp-oxygen-37726

    07/06/2021, 1:19 PM
    Hello, We are evaluating Datahub and trying to estimate what is required to implement and maintain it. We are now evaluating how to extend the metadata model, since our reality is a bit different from what the model presents this is a big issue for us. The reason is we are mainly focused in small data and work with in-house developed tools. So my questions refer to the extension of the model. 1) Step number 6 of the documentation requires that the GraphQL & React models are extended to contemplate what was incorporated to the model in the previous steps. The question is, how is this an optional step? Are the new entities available for querying and the user interface if not done? 2) This is more of a conceptual question... Why is it necessary to redifine in GraphQL & React what was already defined in Pegasus? We see this as an error-prone task which could (should?) be automatic. Is there something we are missing? If not, is this already identified and is it on the roadmap? 3) Similarly, we don't understand why a rebuild in necesary after the model is extended (step 5). We are also concerned that it unnecesarily affects the production system to incorporate new concepts (we see it analogous to restarting a DMBS after a table create). Again, are we missing something here? Thanks a lot. Regards, Agustín Mullin
    g
    • 2
    • 4
  • b

    brief-lizard-77958

    07/06/2021, 1:46 PM
    When compiling datahub frontend with 'gradlew build' I always seem to get a failed build with the error like this: https://pastebin.com/TUjU5shD. I have datahub running and have JDK8. Any suggestion would be very appreciated.
    g
    • 2
    • 7
  • f

    fresh-fish-73471

    07/07/2021, 9:57 AM
    Hi We need to point to external kafka, mysql and elastic search from datahub, currently we point to dockerized versions of them. Please let us know what all changes we need to do in order to achieve this.
    r
    b
    +2
    • 5
    • 22
1...8910...80Latest