https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • d

    damp-minister-31834

    01/24/2022, 10:22 AM
    Hi, all. Can the search box in the UI interface match exactly? For example, I want to search for the table named 'ab' without showing the table 'abc'. Is there a special way to write it?
    b
    • 2
    • 7
  • d

    dazzling-appointment-34954

    01/24/2022, 11:00 AM
    Hi experts, I have a question regarding search behaviour: Is there a way to make Business Glossary properties searchable through the UI search? Do you also have some more information regarding the search UI somewhere? (what is indexed, are there commands you can use like “not”)? Thanks in advance for some help !
    b
    • 2
    • 2
  • p

    prehistoric-dawn-23569

    01/24/2022, 2:05 PM
    Hello all. Does anyone know if it's at all possible to use DataHub without either the Confluent Schema Regstry or the AWS Schema Registry please? I'm constrained such I cannot use the Confluent Community Licence nor a third-party service.
    s
    o
    +6
    • 9
    • 21
  • a

    aloof-arm-38044

    01/24/2022, 5:30 PM
    Hi everyone. With the risk of this post being a bit long, I have a few questions for the community. I’m not sure if this is the accepted format but here we go …. --- Higher level questions: 1. Our company has a quite complex ontology that we would need to bring into DataHub to model our entire ecosystem - this would mean initially bringing in 7 - 10 new entities as well as extending some of the exiting DataHub entities. Based on the docs, this would lead us on the repo forking path. Are there people extending DataHub right now, as heavily as we plan to? Is this a relatively common thing or is this a too complex and treacherous path for us to walk on? 2. Are there any patterns to follow for adopting DataHub? Do people build their own interfaces to abstract away DataHub APIs and let it do the heavy lifting under the hood or anything like that? 3. Ids in DataHub are human readable URNs - if we were to expose our metadata to external organisations they might be leaking implementation details of DataHub and our own ontology. Is there a way to obfuscate ids or map them to synthetic ids (e.g. UUIDs) in DataHub? This could apply to other fields in metadata objects as well. Are there any patterns to follow there? 4. Does DataHub provide referential integrity? It seems that if you create a new entity with a relationship pointing to a URN of another entity that does not yet exist, DataHub creates an empty placeholder entity object with that not yet seen URN as well an empty node in Neo4j. So basically everything is eagerly created and append only. Is my understanding correct? Is there a way to prevent someone from accidentally re-using a particular URN and overriding another team’s metadata object? --- Lower level questions 1. Looking at the GMS docs it seems that the GraphQL API is the public API and the Rest.li APIs are considered as internal. However there doesn’t seem to be full parity between them. For instance the graph traversal API available at the
    /relationships
    endpoint does not seem to be supported in GraphQL. This is extremely important for our use cases. Any reason why it is not supported yet? Any plans to do that in the future? Any best practice for how we can the
    /relationships
    endpoint in the meantime? 2. Trying out the default example for providing a custom metadata model i.e. adding a custom aspect for data quality rules and seeing them automatically rendered in the UI as per

    this presentation▾

    , we’ve noticed some slightly surprising things: 2.1 We can update a custom aspect via the DataHub cli:
    Copy code
    datahub put --urn "urn:li:dataset:(urn:li:dataPlatform:hive,logging_events,PROD)" --aspect customDataQualityRules --aspect-data data/dq_rule.json
    but NOT via a direct curl POST request which returns a 400:
    Copy code
    {"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Parameters of method 'ingest' failed validation with error 'ERROR :: /entity/value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/0 :: \"com.mycompany.dq.DataQualityRules\" is not a member type of union ...
    2.2 Why does a custom aspect not appear in the response to a simple get request on the resource that is has been added to? E.g.
    Copy code
    curl '<http://localhost:8080/entities/urn:li:dataset:(urn:li:dataPlatform:hive,logging_event,PROD)>'
    The above does NOT return the
    DataQualityRules
    aspect as part of the response despite it showing correctly in the UI You get:
    Copy code
    {
      "value": {
        "com.linkedin.metadata.snapshot.DatasetSnapshot": {
          "urn": "urn:li:dataset:(urn:li:dataPlatform:hive,logging_event,PROD)",
          "aspects": [
            {
              "com.linkedin.metadata.key.DatasetKey": {
                "origin": "PROD",
                "name": "logging_event",
                "platform": "urn:li:dataPlatform:hive"
              }
            }
          ]
        }
      }
    }
    2.3 Adding a custom aspect to the CorpUser entity following the exact same steps as the default example does NOT render it on the corp user page. Is there something special about how the CorpUser entities are rendered in the UI that prevents adding custom aspects with the render spec enabled? _Note_: fully rebuilding the project and redeploying GMS and frontend services did not fix the above. ---
    l
    o
    m
    • 4
    • 8
  • m

    mysterious-lamp-91034

    01/24/2022, 5:41 PM
    Do we have the relationship diagram of database of datahub-gms? I want to understand the data modeling in datahub-gms. Thanks
    b
    • 2
    • 1
  • s

    strong-engineer-23656

    01/25/2022, 10:21 AM
    Hello all! I just start Datahub Quickstart and have some question! From https://datahubproject.io/docs/quickstart, I have installed docker, jq and docker-compose but have problem at step 4, datahub docker quickstart. I am working on Ubuntu 18.04. Please check the attachment
    b
    • 2
    • 4
  • b

    brave-secretary-27487

    01/25/2022, 12:30 PM
    Hey all, Got a question about the the back-end server running on port 8000. I understand that this is used to ingress meta data to the system but how is this secured? I followed the setup and deployed the application on GCP but there are no security measures for the back-end and every body could ingress data or create a new ingress if they have access to the IP. What measures should I take to protect the endpoint and what are the recommendations?
    b
    b
    • 3
    • 7
  • d

    dazzling-appointment-34954

    01/25/2022, 4:26 PM
    Hey guys, got another little starter question (thanks a lot for all the support and help so far!): I am struggling a little to figure out what kind of values I can ingest for certain metadatafields. Example: I am adding a custom DataPlatform and it requires a type (in my case this type will be “OTHERS”). How do I figure out in the easiest way what values are actually supported / available? Is there a central documentation somewhere or a way I can query this information through CLI?
    m
    b
    • 3
    • 3
  • m

    mysterious-lamp-91034

    01/25/2022, 11:47 PM
    I checked both UI and Graph API, looks like we don't support adding an arbitrary properties for tables. For example https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:datahub,Dataset,PROD)/Properties?is_lineage_mode=false I am curious, in which scenario can we see the properties? Thanks
    b
    b
    • 3
    • 3
  • b

    breezy-controller-54597

    01/26/2022, 1:14 AM
    It seems that we can give edit permissions to users and groups by setting policy, but can we give view permissions to users and groups for dataset or schema?
    b
    b
    • 3
    • 13
  • h

    happy-island-35913

    01/26/2022, 10:45 AM
    I'm new to DataHub and installed it locally, but I got this error message Validation error of type FieldUndefined: Field 'platform' in type 'Dashboard' is undefined @ 'browse/entities/platform' (code undefined)
    e
    a
    b
    • 4
    • 4
  • m

    most-boots-68766

    01/26/2022, 11:57 AM
    I wonder if datahub supports AWS EMR spark3.0
    b
    • 2
    • 2
  • a

    alert-beach-77662

    01/26/2022, 12:52 PM
    how to setup custom mssql with different IP username and password during installation of datahub because by default it is creating mssql with datahub as username and password. If we want to create our own mssql where everything will be stored. how to do that?
    e
    b
    • 3
    • 10
  • w

    worried-elephant-18735

    01/26/2022, 1:35 PM
    Hi everyone, I am messing around with DataHub for some days now and managed to ingest some data. Now looking at what I have and what I want it do be, I have some questions. 1. What is the usual setup around keeping DataHub up to date? I guess I can run "ingest" on a schedule but as far as I see, this will only take care of creating new resources, not deleting old ones or keeping track of renamings or the likes. Is the only option to delete changed or deleted recourses by hand in DataHub trough the CLI? 2. One of the challenges I have at my company right now is that KPIs are only partly documented and if there are, there isn't a single point where all of those are defined. DataHub has the potential to make that incredibly easy to solve by loading the calculations and descriptions directly form dbt/Looker/whatever. Unfortunately DataHub seems to stop filing data objects at the granularity of a table/view/... and provides more granular information only as properties of those which makes the search function more or less useless for KPIs. Example: I search for "Conversion Rate". The search result set is the "Website Traffic" Explore. That is good because in there, the conversion rate is defined as a KPI. But now the user hat to go trough the "data" of that explore manually to find the conversion rate and it's description. How awesome would it be to be able to Search for "Conversion Rate" and it would be like: "Conversion Rate is a KPI defined like this, calculated like that, to be found in the Website Traffic Explore and here are all of the dependencies." Is there a way to make that happen, I just did not realize it yet?
    b
    b
    • 3
    • 14
  • f

    full-leather-27343

    01/26/2022, 4:08 PM
    Hello, Is there a way to filter in the UI by views and tables?
    l
    b
    • 3
    • 5
  • m

    miniature-television-17996

    01/27/2022, 7:43 AM
    Hello again, could you tell me how to debug such problem ? how to find stdout ?
    m
    b
    s
    • 4
    • 4
  • a

    aloof-father-61672

    01/27/2022, 1:08 PM
    Hi all. Im new to datahub and trying to get it working locally in my M1 Macbook (for local testing some custom ingestion)
    Copy code
    $ datahub docker quickstart --quickstart-compose-file docker/quickstart/docker-compose-without-neo4j-m1.quickstart.yml
    I get the following error:
    Copy code
    Unable to run quickstart - the following issues were detected:
    - datahub-gms is running but not healthy
    
    If you think something went wrong, please file an issue at <https://github.com/linkedin/datahub/issues>
    or send a message in our Slack <https://slack.datahubproject.io/>
    Be sure to attach the logs from /var/folders/zy/bj53_ssx3sg72s1rnqhrtpmh0000gp/T/tmpkpcdr_2l.log
    I've attached
    /var/folders/zy/bj53_ssx3sg72s1rnqhrtpmh0000gp/T/tmpkpcdr_2l.log
    Anyone else having similar issue?
    tmpkpcdr_2l.log
    b
    • 2
    • 5
  • a

    alert-beach-77662

    01/27/2022, 1:27 PM
    How to delete database from datahub as i couldnt find any option in UI
    b
    m
    b
    • 4
    • 12
  • a

    alert-beach-77662

    01/27/2022, 2:10 PM
    how to get the list of URN
    l
    b
    • 3
    • 2
  • m

    mysterious-lamp-91034

    01/28/2022, 7:27 AM
    Hi I am building customized entity, and the set up graphql API to query/mutate it. I want to understand how is
    com.linkedin.schema.EditableSchemaMetadata
    implemented. So that I can write a similar one for my entity. https://github.com/linkedin/datahub/blob/b32975d6dbb75ea696da54c120d9cade58360c5d/[…]/linkedin/datahub/graphql/resolvers/mutate/util/LabelUtils.java The context is I am going to fork datahub, and create entities, aspects for internal use. If I want to mutate my entity, looks like I have to call
    com.linkedin.datahub.graphql.resolvers.mutate.MutationUtils*.*persistAspect
    and the 3rd parameter is a
    RecordTemplate
    , then I have to create a class which inherits
    RecordTemplate
    . I am looking for an example source code, like
    com.linkedin.schema.EditableSchemaMetadata
    . Thanks
    o
    • 2
    • 2
  • l

    loud-musician-49912

    01/28/2022, 2:28 PM
    Hi team, for spark data hub lineage version released 0.8.23 and 0.8.24 we are receiving NullPointerException from DataHubSparkListener class. We are working with spark 2.4.0 and Scala 2.11.12. and python 2.7.5. Can you please help? Even for sample pyspark word count its same issue when implementing above
    m
    l
    +2
    • 5
    • 32
  • s

    sparse-father-33469

    01/31/2022, 9:40 AM
    Hi all. I've ran the DataHub demo env on Docker. It worked out pretty well. I have a question about the sample data: I see there's lineage data and statistical data on certain fields. I have a question about that: where would this data come from? Would DataHub somehow infer lineage or is there some kind of API that you have to call? And where does DataHub get that statistical data on fields?
    b
    • 2
    • 3
  • c

    cool-petabyte-45910

    01/31/2022, 3:08 PM
    Hello guys, I am new to DataHub. I’ve just manage to install it into our ec2-machine and now I am trying to ingest some metadata from snowflake. But I am facing this error when I run ingestion via UI:
    Copy code
    'HTTPError: 401 Client Error: Unauthorized for url: http://<<MY IP>>:9002/api/gms/config\n'
    Is there error in configuration of the credentials for my Snowflake account or is it something else? According the guide I’ve installed the plugin for snowflake access prior to running the ingestion.
    s
    b
    a
    • 4
    • 17
  • p

    prehistoric-dawn-23569

    02/01/2022, 2:07 PM
    Hello. Hopefully a quick question. Is there a way to have a mix of unauthenticated and authenticated users? i.e. a public service for unauthenticated users, with a means of logging in to get additional rights? Our preferred authentication source would be JAAS with an LDAP back-end, but could be flexible. Thanks.
    👀 1
    o
    • 2
    • 2
  • a

    abundant-receptionist-6114

    02/02/2022, 8:05 AM
    Hi, Can we disable auth completely same as in demo https://demo.datahubproject.io/?
    b
    • 2
    • 2
  • a

    alert-beach-77662

    02/02/2022, 9:09 AM
    hi.. can u explain like what exactly we need to modify in this as i m trying to setup custom mssql step https://github.com/linkedin/datahub/blob/master/docker/quickstart/docker-compose-without-neo4j.quickstart.yml#L49 ( docker-compose-without-
    o
    • 2
    • 1
  • a

    alert-beach-77662

    02/02/2022, 11:11 AM
    how to completely uninstall datahub?
    b
    c
    • 3
    • 2
  • a

    ambitious-exabyte-53451

    02/02/2022, 1:24 PM
    Hi all ; I've setup datahub on AWS as per guide ; I get the logon screen ; when I logon ; white page ; and this error in the frontend logs : Caused by: com.linkedin.r2.message.rest.RestException: Received error 404 from server for URI http://datahub-datahub-gms:8080/dataJobs
    o
    • 2
    • 5
  • s

    sticky-stone-98991

    02/02/2022, 3:34 PM
    Hello everyone. I was wondering if there is a plan to add Databricks Delta lake to the supported sources?
    o
    l
    • 3
    • 3
  • d

    dazzling-appointment-34954

    02/02/2022, 6:03 PM
    Hi guys, quick question about policies: Is there a way to create a policy based on a platform? (e.g. allow metadata changes on all Looker assets) Thanks in advance, you guys are awesome!
    o
    • 2
    • 3
1...192021...80Latest