https://datahubproject.io logo
Join Slack
Powered by
# getting-started
  • m

    millions-waiter-49836

    02/02/2022, 8:57 PM
    Hi everyone, may I know if we plan to support profiling for ingestion from glue or s3 in the future? As I see from here that it seems profiling only supports SQLAlchemy compatible data stores
    o
    l
    • 3
    • 5
  • q

    quick-flower-31185

    02/03/2022, 2:09 AM
    Hi Everyone, I am new to datahub. I wanted to know if there is a hosted version of Datahub that I can pay to use? Is self-hosting my only option at the moment?
    o
    l
    • 3
    • 2
  • t

    tall-toddler-62007

    02/03/2022, 6:13 AM
    Hi Everyone, Does Datahub provide a list of all available APIs ?? Like a Swagger UI page or something ??
    b
    • 2
    • 3
  • b

    better-solstice-54187

    02/03/2022, 12:36 PM
    Hi everyone. It's possible to deploy Datahub to k8s with PSQL, external kafka and without neo4j? Especially with PSQL
    b
    h
    • 3
    • 2
  • c

    calm-river-44367

    02/03/2022, 1:59 PM
    hey, I have been trying to insert new users for datahub, using user.props file. what I'm doing is I add the name and password of my custom user in the user.props exactly the way the user datahub:dathub has been written there. And then for the datahub-frontend-react part in the docker-compose file I add: build:       context: ../       dockerfile: docker/datahub-frontend/Dockerfile     image: linkedin/datahub-frontend-react:${DATAHUB_VERSION:-head}     env_file: datahub-frontend/env/docker.env     hostname: datahub-frontend-react     container_name: datahub-frontend-react     ports:       - "9002:9002"     depends_on:       - datahub-gms     volumes:       - ./my-custom-dir/user.props:/datahub-frontend/conf/user.props with the address of the user.props and env file where ever its needed and as it says in the docs I run the new docker-compose file, however custom users have not been added. does anyone have any idea on what I'm doing wrong or not doing?? https://datahubproject.io/docs/how/auth/jaas#mount-a-custom-userprops-file-docker-compose
    b
    • 2
    • 5
  • s

    sticky-stone-98991

    02/03/2022, 3:18 PM
    Hi, I am running datahub locally using the quickstart. I am trying to connect to Databricks using Hive Source but my job keeps failing. Here is my yml:
    Copy code
    source:
        type: hive
        config:
            host_port: '<http://dbc-XXXXXXXX-XXXX.cloud.databricks.com:443|dbc-XXXXXXXX-XXXX.cloud.databricks.com:443>'
            username: token
            password: dapicxxxxxxxxxxxxxxxx
            scheme: databricks+pyhive
            options:
                connect_args:
                    http_path: sql/1.0/endpoints/xxxxxxxxxxxxxxxx
    sink:
        type: datahub-rest
        config:
            server: '<http://127.0.0.1:8080>'
    Error:
    Copy code
    "ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: /config (Caused by "
               "NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fac4f14cca0>: Failed to establish a new connection: [Errno 111] "
               "Connection refused'))\n",
               "2022-02-03 15:12:34.850410 [exec_id=f58fdd9d-3981-4399-9d68-07a774b790f4] INFO: Failed to execute 'datahub ingest'",
    From the error the issue seems to be the sink and not the source but I am very new to datahub to make any sense of this
    b
    o
    +3
    • 6
    • 19
  • m

    mysterious-lamp-91034

    02/03/2022, 6:22 PM
    Do we have code formatter config file? For example
    .prettierrc
    The context is I am using VScode and prettier to format my java code. The build complains
    Copy code
    [ant:checkstyle] [WARN] /home/zxu/code/schema_registry/datahub-graphql-core/src/main/java/com/linkedin/datahub/graphql/types/pin/mappers/PinFieldMapper.java:46: Line is longer than 160 characters (found 169). Try to keep lines under 120 characters. [LineLength]
    Thanks
    o
    • 2
    • 7
  • m

    mysterious-lamp-91034

    02/03/2022, 8:06 PM
    Is it possible to ingest multiple aspects atomically in the entity creation time? The context is I am creating a customized entity, the entity has multiple aspects. In the entity creation time, I want to create them all. I am following the example in this line Here we use
    _entityClient.ingestProposal()
    Looks like each proposal(MCP) includes only one aspect. Is there a way to create multiple aspects in an atomic fashion? Thanks
    o
    • 2
    • 3
  • a

    ancient-author-86397

    02/03/2022, 8:18 PM
    Hello! I'm not sure if this question is better here or in #all-things-deployment - I read in the docs (https://datahubproject.io/docs/architecture/metadata-serving/#metadata-storage) that gms can use a key-value store like Couchbase rather than an RDBMS, is there more documentation on how to do this? Thanks!
    o
    • 2
    • 8
  • a

    ancient-author-86397

    02/04/2022, 1:33 AM
    Hi! I'm reading into https://datahubproject.io/docs/metadata-modeling/extending-the-metadata-model and was wondering if I create a new Entity with its own Entity Key Aspect, do I also need to create a custom URN class or will it be generated automatically? This would be for the purpose of referencing it in the Entity Snapshot
    o
    m
    • 3
    • 8
  • m

    mysterious-portugal-30527

    02/04/2022, 3:10 PM
    Can anyone share a proper example?
    d
    i
    s
    • 4
    • 26
  • r

    rich-winter-40155

    02/04/2022, 4:14 PM
    Hi , Is there any way to look at versioning of metadata for a table or dashboard etc.. like all the changes to a table through UI or APIs? .
    b
    o
    m
    • 4
    • 17
  • c

    calm-river-44367

    02/05/2022, 9:06 AM
    Is there any way to set policies for users in datahub? like who can have access to a certain database, or to set only some specific users to be able to edit datasets. Also, how can I set my datahub root user? The user that sets the policies and privleges for other users?
    b
    • 2
    • 6
  • s

    salmon-area-51650

    02/07/2022, 4:07 PM
    Hi!! Is there documentation about the search engine. I’d like to know how to introduce wildcards, expressions negated, regex? Thanks in advance!!
    i
    b
    • 3
    • 2
  • c

    cool-petabyte-45910

    02/08/2022, 1:02 PM
    Hello guys, i would have one question to lineage and Snowflake. I am just testing this feature and it is super nice. One thing what I am missing is visibility of the query which populates particular table. Will that be also added? Such information is available in query history so I think it should be possible to match the query with table. Then the lineage would provide complete overview containing also logic behind each table
    d
    • 2
    • 3
  • r

    rich-magician-30655

    02/09/2022, 2:43 AM
    Hi, I have an issue with Google OAuth and DataHub. In the beginning, the Google sign-in worked, recently is failed, and redirected to the error with the message:  
    Failed to perform post-authentication steps. Error message: com.linkedin.data.template.RequiredFieldNotPresentException: Field "value" is required but it is not present.
    Do you have any idea?
    s
    i
    c
    • 4
    • 55
  • b

    bored-dress-52175

    02/09/2022, 7:37 AM
    hello I am new to Datahub. I was working on redshift. I have installed redshift but when I run datahub check plugins in my terminal it shows redshift is disabled. So does this error is popping up because I haven’t granted access to plugin?(attached image).and where do I have to write this command (ALTER user..)?I tried in terminal but it didn’t work out.
    d
    • 2
    • 12
  • a

    able-crowd-80391

    02/09/2022, 11:05 AM
    Hi all, Im getting started with datahub, and I appreciate your patience with a basic question I have datahub deployed in a EKS cluster and I want to connect it to kafka As I understand I need to do this https://datahubproject.io/docs/metadata-ingestion/source_docs/kafka/ However I dont know the best way to use it with k8s. How can I install it? Should I use another container for that matter? Thanks!
    i
    • 2
    • 5
  • b

    bland-orange-13353

    02/10/2022, 5:31 AM
    This message was deleted.
    g
    • 2
    • 2
  • w

    wonderful-author-3020

    02/10/2022, 10:11 AM
    Hello Everyone, I'm looking at integrating our ML pipelines with DataHub and I'm currently researching creating ML Models. I've created a bunch of them but when I click on the "ML Models" link on the main page, I see "No Entities". Apart from that, is there any way to link an existing ML model with an inference dataJob? I'm successfull with connecting training jobs via
    mlModelProperties.trainingJobs
    , but
    mlModelProperties.downstreamJobs
    doesn't seem to work. The property is set but the relationship is not visualized anywhere 😞 Thank you for your support!
    i
    b
    h
    • 4
    • 9
  • h

    high-toothbrush-90528

    02/10/2022, 11:07 AM
    Hi everybody. I want to deploy my solution using kubernetes. acryl-datahub-actions is not running. Am I using the https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml and in addition:
    Copy code
    acryl-datahub-actions:
      enabled: true
      image:
        repository: public.ecr.aws/datahub/acryl-datahub-actions
        tag: "v0.0.1-beta.8"
    ✅ 1
    s
    • 2
    • 6
  • b

    bored-dress-52175

    02/10/2022, 6:52 PM
    This error pops up when I followed steps given in the doc given below. https://datahubproject.io/docs/docker/airflow/local_airflow
    d
    • 2
    • 2
  • a

    alert-teacher-6920

    02/10/2022, 10:04 PM
    Good evening, getting started writing a custom Java emitter. Got through the quick-start and Java emitter examples just fine. I’d also like to write things like schema, queries, documentation, and not just dataset properties. I’ve spent a few hours trying to guess what all I should be sending with no success. Are there any examples or documentation or hosted JavaDocs that explain how to use the different Aspects? I’ve tried using DatasetAspect and setting a SchemaMetadata, but ultimately got errors about not being able to infer an aspectName. Then I tried to set one and got more errors. Then I tried just sending the SchemaMetadata as the aspect, which compiles but I keep getting error after error for that, and I’m currently stuck with the platformUrn needing exactly one union something. And to be completely honest, I don’t know what any of that means, and I’m not totally sure where in the docs to learn what all I should be sending.
    m
    c
    • 3
    • 10
  • b

    brave-carpenter-25046

    02/11/2022, 11:38 AM
    Hi there, I am running datahub locally as the quickstart guide, I execute the command above, it seems have no error, but why i can't see any metadate in the ui
    s
    • 2
    • 2
  • b

    brave-carpenter-25046

    02/11/2022, 11:38 AM
    image.png
    b
    • 2
    • 1
  • h

    high-toothbrush-90528

    02/11/2022, 1:57 PM
    Hi everybody! Is there an option to delete domains?
    s
    b
    • 3
    • 3
  • w

    wonderful-author-3020

    02/11/2022, 2:35 PM
    Hello 🙂 Is there any way to deserialize an object obtained from
    https://<base_url>:8080/entities/<urn>
    to a
    snapshot
    class from
    datahub.metadata.schema_classes
    ?
    s
    l
    • 3
    • 10
  • s

    stocky-midnight-78204

    02/15/2022, 5:04 AM
    I try to add integrate datahub with spark, I am able to get the task lineage but spark job got stuck and is always running.
    b
    m
    c
    • 4
    • 3
  • m

    mysterious-kitchen-97015

    02/15/2022, 10:29 AM
    Accessing the actual data Forgive me for asking this very basic question, but I'm new to the whole (meta)-data-world. My user story: As a data scientist, I've been browsing the DataHub UI and found a nice dataset I want to explore further, e.g. this one: https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:s3,project%2F[…]%2Flogging_events_bckp,PROD)/Schema?is_lineage_mode=false . Is there a way to get access to the underlying data (i.e., a URL pointing to a path in the S3-Bucket) right out of the DataHub UI? I'm thinking of a "download data" button, which points me to the source? As said in the beginning, I'm new to this world, and I don't yet understand how DataHub is intended to be used, after the metadata has been ingested and I can browse it via the UI/API.
    b
    s
    +2
    • 5
    • 12
  • b

    better-spoon-77762

    02/16/2022, 2:34 AM
    Hello is there a way to enable SSL on frontend/web-react ? by default it runs on port 9002?
    b
    • 2
    • 2
1...202122...80Latest