https://datahubproject.io logo
Join SlackCommunities
Powered by
# getting-started
  • m

    mammoth-bear-12532

    03/24/2021, 2:48 PM
    <!here> we have had great responses so far! Will keep this open for another 3 days before sharing results. Do get your votes in!
    b
    d
    • 3
    • 2
  • h

    high-hospital-85984

    03/24/2021, 3:01 PM
    To use simple authentication with elasticsearch, can we just simply put as host something like ”username:password@my-es-addr.com” ?
    • 1
    • 1
  • h

    high-hospital-85984

    03/25/2021, 7:14 AM
    @wonderful-quill-11255 We’re in a situation were we’d need to add some BasicAuth over HTTPS functionality to RestHighLevelClientFactory.java. I think the idea would be to conditionally add a BasicCredenetialsProvider (like here) . Do you see any problems with this?
    w
    g
    • 3
    • 10
  • e

    early-hydrogen-59749

    03/25/2021, 2:17 PM
    Hello @mammoth-bear-12532 @green-football-43791 I was trying to use the latest tagging feature introduced in v7. Wanted to check whether the tag indexing is done both on the dataset level as well as on the attribute level? I applied the same tag on both dataset and attribute. But when I searched with tag filter, it returned just the dataset linked to it. This states the indexing happened just on the dataset level and not on attribute. Is my understanding correct?
    m
    • 2
    • 1
  • c

    calm-sunset-28996

    03/25/2021, 3:07 PM
    I’m kind of struggling with the jetty-runner part of GMS. Is there a specific reason for using this and not Jetty? As it seems to be deprecated. A quick solution is posted here (https://github.com/eclipse/jetty.project/issues/1905) I’m trying to add JSON logging for all components.
    b
    m
    • 3
    • 5
  • i

    incalculable-ocean-74010

    03/26/2021, 5:28 PM
    Hello (once more :D), couple of auth + users questions. Does DataHub have to store users in it's own databases for authorization purposes or can it defer that to systems like Active Directory? I.e: Can user X access datahub? As a follow up, does DataHub have a way to define which users can access what? I.e: Define a user group that can access a set of entity instances but not others: research data scientists can see research datasets but not production information meant for product teams.
    m
    b
    • 3
    • 9
  • m

    mammoth-bear-12532

    03/29/2021, 3:16 PM
    <!here> Zoom link for office hours happening now: https://zoom.us/j/94456584041?pwd=TVdZNGg1L0x1eFV0RTVadkJ5Szg0dz09
    w
    • 2
    • 2
  • t

    thousands-tailor-5575

    03/30/2021, 3:30 PM
    Hi @big-carpet-38439! I am actually right now looking at your presentation that I found here https://medium.com/datahub-project/linkedin-datahub-project-updates-february-2021-edition-338d2c6021f0. You mentioned that there are currently no lineage visualisation options in the react app. Are there any plans for this to be added in the near future (e.g. something similar like dbt lineage graphs)?
    😍 1
    b
    • 2
    • 2
  • c

    chilly-spring-43918

    03/31/2021, 12:26 PM
    Hi, i tried to install datahub using helm to kubernetes. but it failed only on datahub-gms pod. is it a typo or there should be another step before install?
    m
    e
    • 3
    • 5
  • c

    chilly-spring-43918

    04/01/2021, 8:24 AM
    Hi, i keep getting this error in datahub-gms after successfully deploy in k8s, any suggestion what should i do?
    Copy code
    ERROR ContextLoader Context initialization failed
     org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dataProcessDAO' defined in com.linkedin.gms.factory.dataprocess.DataProcessDAOFactory: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.dao.BaseLocalDAO]: Factory method 'createInstance' threw exception; nested exception is java.lang.NullPointerException
            at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:656)
            at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:484)
            at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1338)
            at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1177)
            at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557)
            at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517)
            at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323)
            at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222)
    b
    • 2
    • 16
  • n

    nutritious-bird-77396

    04/02/2021, 4:55 PM
    @microscopic-receptionist-23548 Could you help me understand the reasoning behind splitting GMA into a separate repo?
    m
    m
    b
    • 4
    • 8
  • m

    millions-engineer-56536

    04/06/2021, 5:18 PM
    Is there a concept of tag type/category? not all tags are made equal… Some tags cary important information in them, some are calls to action (ex: Needs Ownership), some tags are meant to communicate a warning, some tags are suggested and might need further approval/confirmation… Ideally tags from different categories should be distinguishable in UI (probably different color)… if there is a better channel to discuss this please let me know
    h
    g
    +2
    • 5
    • 8
  • b

    busy-accountant-26554

    04/09/2021, 7:50 AM
    Hi all, does anyone know if it possible to search for dataset schema columns in the DataHub React GUI? And if so, what would be the correct syntax?
    e
    • 2
    • 1
  • m

    mammoth-bear-12532

    04/14/2021, 1:57 AM
    <!here> I will be sharing my learnings and insights about DataHub and metadata this Thursday at a live event. Please share with everyone that will be interested 🙏 https://www.eventbrite.com/e/dc-thurs-on-datahub-w-shirshanka-das-acryl-data-tickets-146593935407
    👍 4
    l
    • 2
    • 1
  • b

    big-carpet-38439

    04/20/2021, 3:49 PM
    Welcome @delightful-plumber-77060 @red-journalist-15118 @witty-agent-93707 @stale-nightfall-11938!
    👻 3
    w
    r
    • 3
    • 2
  • m

    mammoth-bear-12532

    04/21/2021, 7:33 PM
    Town-hall Card is here 🙂
    • 1
    • 1
  • m

    mammoth-bear-12532

    04/23/2021, 3:23 PM
    <!here> TownHall Happening in 36 minutes! Zoom: https://zoom.datahubproject.io
    w
    • 2
    • 2
  • a

    acoustic-printer-83045

    04/26/2021, 4:17 AM
    No rush on this, just wondering if anyone else is seeing the quickstart datahub off master (c64196e8c) is unable to view ingested metadata. I have the sample data ingestion
    ./docker/ingestion/ingestion.sh
    returning what looks like success after what appears to be a successfull standing up of datahub on docker IE:
    Copy code
    ➜  datahub git:(enhance-dbt-ingestion) ✗ datahub check local-docker                                        
    The following issues were detected:
    - kafka-topics-ui is not running
    - schema-registry-ui is not running
    I don't think either of those are required to be running. When I run the ingest I get this:
    b
    g
    • 3
    • 6
  • a

    acoustic-printer-83045

    04/26/2021, 4:17 AM
    However I'm unable to see anything in the datahub UI. IE no datasets present, I've run my DBT ingestion toolchain + a postgres ingestion setup I created to use to tweak the DBT metadata ingest. Thanks a bunch!
    g
    b
    e
    • 4
    • 22
  • w

    wide-rain-9038

    04/26/2021, 11:55 AM
    Hello, I'm trying to run the app locally using docker, but it is stuck on pulling data from docker repo. Does anyone else has the same experience?
    Copy code
    ➜ ./docker/quickstart.sh
    
    Pulling neo4j                  ... done
    Pulling zookeeper              ... done
    Pulling broker                 ... done
    Pulling schema-registry        ... done
    Pulling kafka-setup            ... done
    Pulling schema-registry-ui     ... done
    Pulling kafka-rest-proxy       ... done
    Pulling kafka-topics-ui        ... done
    Pulling elasticsearch          ... done
    Pulling elasticsearch-setup    ... done
    Pulling datahub-mae-consumer   ... waiting
    Pulling kibana                 ... done
    Pulling mysql                  ... done
    Pulling datahub-gms            ... waiting
    Pulling datahub-frontend-react ... waiting
    Pulling datahub-mce-consumer   ... waiting
    It's stuck on waiting forever 🙂
    m
    b
    • 3
    • 4
  • s

    stale-jewelry-2440

    04/26/2021, 2:32 PM
    Hi, do you know where I can find some basic tutorial? As example, I successfully added metadata from my SQL Server source, but I don't know how to delete the test metadata ingested from the "quickstart"
    e
    b
    +2
    • 5
    • 14
  • b

    better-orange-49102

    05/05/2021, 2:15 AM
    i'm using the quick-start with a custom MCE that i wrote. for me to delete a dataset that i've ingested via the rest api, is there any better/easier way to do so other than to: 1. go into mysql and delete record 2. go into ES container and delete document 3. go into neo4j container and delete data is there any other container that i missed out?
    a
    e
    +2
    • 5
    • 14
  • i

    icy-holiday-55016

    05/10/2021, 8:39 AM
    Hi folks, does Datahub allow for an Airflow task to have multiple output Datasets? Using the lineage_backend_demo.py produces the lineage graph shown in the screenshot below. If I modify that file to add two additional outlet datasets, the lineage graph remains the same if the context is centered on the run_data_task Task. If I switch the context to one of the new datasets I added (for example tableG), it shows the lineage from the originating dataset, omitting the the Task. I'm trying to determine if this is simply a bug, or if Tasks aren't intended to have multiple outlet Datasets. I'm inclined to think its just a bug, as the outlet datasets value is an array, though would be good to confirm. Thanks
    b
    g
    • 3
    • 22
  • s

    some-cricket-23089

    05/11/2021, 5:41 AM
    1. Does master branch have all latest changes of 0.7.1 tags 2. I was looking for some tutorial where i can read about the configuration to manage the heirarchy . Please let me know where i can found that. That will be a great help in my exploration part. Please let me know
    g
    • 2
    • 17
  • i

    icy-holiday-55016

    05/17/2021, 12:33 PM
    Hi folks, is field level lineage supported at the moment? I see there was an RFC for it, and there is code in the repo corresponding to the RFCs, though the docs indicate it's still coming soon. https://datahubproject.io/docs/rfc/active/1841-lineage/field_level_lineage
    l
    g
    g
    • 4
    • 21
  • m

    many-egg-4654

    05/18/2021, 6:28 AM
    Hi folks, So I was trying to setup Datahub and I was successful (using Docker containers). So now, I am trying to setup a airflow-lineage-DAG in airflow which is running in conatiner as well on some other port. (Airflow). But while I add the DAG file mentioned here, I am getting a ModuleNotFound Error. Can anyone help me what exactly am I doing wrong here that my DAG import is giving error? or its it some issue with the python module used by Datahub? The DAG I'm trying to import : https://github.com/linkedin/datahub/blob/master/metadata-ingestion/src/datahub_provider/example_dags/mysql_sample_dag.py
    g
    • 2
    • 7
  • m

    many-egg-4654

    05/18/2021, 8:46 AM
    A little conceptual clarification needed, does Lineage of data also shows me the changes I made to a specific field of database? Suppose for example I deleted/modified a field in Glue, will the Lineage section show me the modifications?
    l
    c
    • 3
    • 5
  • m

    mammoth-bear-12532

    05/18/2021, 7:38 PM
    <!here> Just a reminder that this is happening tomorrow!
    👍 3
    👍🏼 1
    l
    c
    +2
    • 5
    • 5
  • m

    mammoth-bear-12532

    05/24/2021, 4:39 PM
    <!here> In case you missed it, the video for the event is now published!

    https://www.youtube.com/watch?v=fEILyoWVpBw▾

    🎉 3
    🙌 6
    f
    • 2
    • 1
  • b

    broad-flag-97458

    05/25/2021, 5:31 PM
    Hi everyone, I’m trying to get an idea of the minimum production hardware requirements (disk, memory, cpu, etc.) for the four core components of DataHub (GMS, MAE Consumer, MCE Consumer, and Frontend). Does anyone have any insights/recommendations (or maybe point me to the doc where that’s outlined)?
    g
    b
    • 3
    • 4
1...678...80Latest