https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • h

    hallowed-gpu-49827

    02/10/2022, 12:33 PM
    Hello folks I’m getting 503 on all graphql requests with this message:
    upstream connect error or disconnect/reset before headers. reset reason: connection termination/
    The logs from frontend is:
    Copy code
    datahub-frontend 12:27:41 [application-akka.actor.default-dispatcher-14] WARN  akka.actor.ActorSystemImpl - Explicitly set HTTP header 'Content-Type: text/plain'
    is ignored, explicit `Content-Type` header is not allowed. Set `HttpResponse.entity.contentType` instead.
    datahub-frontend 12:27:41 [application-akka.actor.default-dispatcher-14] WARN  akka.actor.ActorSystemImpl - Explicitly set HTTP header 'Content-Length: 95' is ign
    ored, explicit `Content-Length` header is not allowed. Use the appropriate HttpEntity subtype.
    i
    o
    b
    • 4
    • 18
  • g

    gifted-queen-61023

    02/10/2022, 3:20 PM
    Hey guys waving from afar left Hope you're doing well. I was adding a new entity Reports to DataHub. I was able to add it (with more obstacles than I was hoping for), and I'm able to successfully ingest metadata by command line with a
    CURL
    POST
    operation. When I define a MCE an try to use
    datahub ingest -c
    , the following error arises: • AttributeError: module 'avro.schema' has no attribute 'AvroException' Not sure how to solve it. Appreciate any help. Thanks in advance 🙂
    i
    • 2
    • 7
  • n

    numerous-eve-42142

    02/10/2022, 6:00 PM
    Hi everyone! Hi, I am trying to config Airflow to use Datahub as Lineage backend. But in my scenario we can't change the airflow.cfg directly because we are using the official airflow helm chart. I can only configure it via env variables. I found the AIRFLOWLINEAGEBACKEND variable, but I cant find how to configure the datahub_kwargs via env variables... https://datahubproject.io/docs/lineage/airflow/# In second place, to include dag information into datahub, after configure airflow it's just necessary to start ingestions (as the sample file) from the database? Thanks for all your help!
    l
    e
    • 3
    • 4
  • n

    nutritious-bird-77396

    02/10/2022, 6:40 PM
    I would like to enable UI Based Ingestion released as part of
    0.8.26
    internally. As i understand a new service
    datahub-actions
    has been added to execute Datahub CLI and push events to Kafka I see only the Docker env file but the Docker scripts or the project is missing... The reason why i ask this is I would want to add MSK IAM Auth Jar to the project inorder for it to communicate with MSK. Any help would be great. Thanks!
    plus1 1
    l
    l
    e
    • 4
    • 10
  • s

    strong-iron-17184

    02/10/2022, 6:46 PM
    Hello I have a problem when running docker-compose up, I have an unhealthy container
    s
    • 2
    • 8
  • a

    adorable-sandwich-55776

    02/10/2022, 7:46 PM
    Hi, question regarding `datahub delete`: I have seen that the data was deleted when I do
    datahub delete --urn 'urn:li:dataset:(urn:li:dataPlatform:kafka,-l,PROD)' --hard
    , but the command line interface hangs adoes not exit. Is this a bug?
    Copy code
    $ datahub delete --urn 'urn:li:dataset:(urn:li:dataPlatform:kafka,-l,PROD)' --hard
    This will permanently delete data from DataHub. Do you want to continue? [y/N]: y
    [2022-02-10 19:42:37,697] INFO     {datahub.cli.delete_cli:126} - DataHub configured with <http://localhost:8080>
    
    (... nothing else is printed, but it does not return)
    s
    • 2
    • 16
  • a

    ambitious-guitar-89068

    02/11/2022, 4:49 AM
    Faced an issue with Tableau Ingestion: https://github.com/linkedin/datahub/issues/4119
    s
    • 2
    • 2
  • n

    narrow-bird-99605

    02/11/2022, 10:52 AM
    Hi, when I try to add owner for domain, I am getting next error:
    Failed to add owner: Unauthorized to perform this action. Please contact your DataHub administrator.
    . I am performing this action on the test cluster and I am using root user
    datahub
    . How can I find out what permission is missing and/or more detailed logs of the issue? Thanks in advance!
    f
    s
    b
    • 4
    • 14
  • d

    damp-queen-61493

    02/11/2022, 7:54 PM
    Hi, sorry for the newbie question. I'm trying to extend the Graph with a new entity based on the documentation, where is
    gms.graphql
    ? Has this file been replaced by
    entity.graphql
    ?
    o
    b
    • 3
    • 10
  • m

    mysterious-lamp-73086

    02/11/2022, 10:28 PM
    Hi! I try to using UI ingestion from mongodb, i had error “ConfigurationError: mongodb is disabled; try running: pip install ’acryl-datahub[mongodb]“, but acryl-datahub[mongodb] installed ( and in log i had message “INFO: stdout=Requirement already satisfied: pip in”
    b
    e
    • 3
    • 13
  • d

    damp-minister-31834

    02/12/2022, 2:29 AM
    HI, all! The new feature data container came out, but I can't find related document or API guidance. Who can tell me how to create a container?
    🤔 1
    b
    d
    • 3
    • 9
  • m

    modern-artist-55754

    02/13/2022, 1:58 PM
    hi all, i'm testing out Datahub for our data discoverability platform now. I'm having an issue with duplicated datasets as I'm ingesting from both Snowflake & DBT. I need to ingestion from both because DBT will provide me a proper lineage and snowflake will give some usage history on the tables. Is there anyway to fix this issue?
    l
    h
    • 3
    • 7
  • f

    few-air-56117

    02/14/2022, 8:44 AM
    Hi guys, i tried to ingest biguqery-usage ( for stats) This is the recepie.
    Copy code
    source:
      type: bigquery-usage
      config:
        # Coordinates
        projects:
          - <project_1>
          - <project_2>
    sink:
      type: "datahub-rest"
      config:
        server: <ip>
    But i got this error (for a table which is refreshed every 30 min)
    Copy code
    'failed to match table read event with job; try increasing '
                                                                                  '`query_log_delay` or `max_query_duration`',
    Do you have any ideea?
    d
    • 2
    • 2
  • b

    blue-boots-43993

    02/14/2022, 9:59 AM
    Hi all, not sure if this is sqlalchemy/pyodbc specific but I will ask anyways....we have a MSSQL Server and in of the dbs we have a situation like this:
    Copy code
    info_cache = {('get_schema_names', (), ()): ['name', 'name.other', 'name.other.dbo' ....]
    as you can see there are some schema names that contain dots inside them. Usually they would be written as [
    Database].[name.other.dbo].[TableName].[Column]
    right? Well I cannot seem to figure out how to force sqlalchemy (or the ingestion script) to not tread
    'name.other'
    as a schema and not database-schema combination. I tied with schema_pattern.allow properties but with no luck
    d
    b
    • 3
    • 17
  • b

    boundless-student-48844

    02/14/2022, 11:41 AM
    Hey team, what’s the current advised approach to add a new entity to the metadata model? Can we leverage the no-code UI for new entities? I am currently following this guide, and am stuck on what to change in
    datahub-graphql-core
    repo (Step 8). This doc seems to be outdated https://datahubproject.io/docs/datahub-graphql-core/#near-term
    g
    • 2
    • 8
  • c

    cuddly-engine-66252

    02/14/2022, 12:44 PM
    Hello everyone, I'm trying to launch via datahub/docker/quickstart/quickstart.sh But I am faced with the fact that some services start on ports that are busy. So I changed the file docker-compose-without-neo4j.quickstart.yml Namely: all env variables GMS_PORT, and datahub-gms: ports: - 6667:8080 After that, datahub-gms starts successfully and appears in the list of docker containers. But after a while, errors of this format begin to be spammed:
    Copy code
    datahub-actions_1         | 2022/02/14 11:04:50 Problem with request: Get "<http://datahub-gms:6667/health>": dial tcp: lookup datahub-gms on 127.0.0.11:53: server misbehaving. Sleeping 1s
    Also after these errors start, I don't see datahub-gms in active docker containers(I attach the container log)
    Copy code
    (final err from log - Failed to send HTTP request to endpoint: <http://schema-registry:6668/subjects/MetadataChangeLog_Versioned_v1-value/versions>
    java.net.ConnectException: Connection refused (Connection refused))
    What can be done with these?
    gms.log
    o
    • 2
    • 20
  • p

    prehistoric-room-17640

    02/14/2022, 2:09 PM
    Hi all, I'm a newbie here to datahub so forgive the question. We have datahub deployed to K8s and I'm able to ingest and see the metadata in the browser (when I explicitly click on the table name), however I can't seem to search for anything, datahub just reports no metadata when I search for tables that do exist. I've included some screenshots here: (through browsing)
    i
    e
    • 3
    • 14
  • a

    alert-teacher-6920

    02/14/2022, 5:28 PM
    I’ve been looking over the Dataset schema field types here: https://demo.datahubproject.io/dataset/urn:li:dataset:(urn:li:dataPlatform:datahub,Dataset,PROD)/Schema, and I’m wondering via the Java emitter API how to make Struct types, specifically how to include what fields the struct has and what their types are. I don’t see a StructType class. Do I use Record? and is there any example of how to specify the sub-fields’ types?
    h
    • 2
    • 8
  • p

    plain-farmer-27314

    02/14/2022, 6:56 PM
    Hey all - we are trying to update to the latest version (0.8.26) and are getting the following when updating through helm:
    Error: secret "datahub-encryption-secrets" not found
    Is this a new secret we need to add? I double checked the deploying with kubernetes docs and didn't see it mentioned
    m
    • 2
    • 1
  • n

    numerous-guitar-35145

    02/14/2022, 6:58 PM
    Hello, I'm integrating DataHub to our data environment in RD Station, but I'm facing questions and issues from some users, can you guys help me? First we tried to install using this "datahub docker quickstart" approach, but then we wanted to use OKTA for authentication and we use this approach "https://datahubproject.io/docs/how/auth/sso/configure-oidc-react-okta", I didn't find the files, so I pulled the repository from git and it worked. Now I'm trying to upgrade the data hub as a whole to the latest version using "https://datahubproject.io/docs/docker/datahub-upgrade". It didn't works I'm getting this error "Problem with dial: dial tcp: lookup broker on 127.0.0.1153 server misbehaving" and I'm confuse about what datahub image are running the quickstart one or the cloned code. Another question, how can we update the datahub version without losing the metadata we insert in the previous version?
    b
    d
    • 3
    • 5
  • b

    bland-barista-59197

    02/14/2022, 10:23 PM
    Hi everyone Can I add multiple tags using GraphQL? like following
    Copy code
    mutation addTag {
        addTag(input: { tagUrn: "urn:li:tag:NewTag", resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" }),
    addTag(input: { tagUrn: "urn:li:tag:NewTag1", resourceUrn: "urn:li:dataFlow:(airflow,dag_abc,PROD)" })
    }
    o
    • 2
    • 2
  • f

    few-air-56117

    02/15/2022, 7:40 AM
    Hi guy, i try to make a grapql call (the token its from datahub ui/setting/accces tokens
    Copy code
    curl -X POST '<link>/api/v2/graphql' \
    --header 'Authorization: Bearer <token_from_datahub>' \
    --header 'Content-Type: application/json' \
    --data-raw '{"query":"{\n  me {\n    corpUser {\n        username\n    }\n  }\n}","variables":{}}'
    But returns 401 :(
    • 1
    • 1
  • m

    modern-monitor-81461

    02/15/2022, 3:50 PM
    I am trying to debug an issue with
    groups
    claim in Azure OIDC and I can't figure out where my error is coming from. My Azure OIDC integration works until I enable
    groups
    to be present in an ID token. When
    groups
    claim is present, I get a
    502 Bad Gateway
    and I can't login to DataHub. I have looked at both frontend and gms logs (info & debug) and I can't see what would be causing a
    502
    . If you look at the attached screenshot, the
    GET https://<dh_server>/authenticate?redirect_uri=%2F
    returns a
    303
    to
    <https://login.microsoftonline.com/{tenant> id}/oauth2/v2.0/authorize?response_type=code&redirect_uri=https://<dh_server>%2Fcallback%2Foidc
    , which returns a
    302
    to the expected callback URL
    https://<dh_server>/callback/oidc?code=0...
    , which causes a
    502
    ... (
    dh_server
    is the DataHub server). The authentication is a success since I can find the user with its profile. I added extra debug statements in
    auth.sso.oidc.OidcCallbackLogic
    and all looks good. I thought that class would be the one handling the OIDC callback, but looks like I'm wrong. In order to debug further, can someone tell me which class is handling the
    https://<dh_server>/callback/oidc?code=0...
    request? @big-carpet-38439 probably knows this, but I think he is on vacation 🥳. Anyone else?
    b
    l
    o
    • 4
    • 12
  • b

    broad-thailand-41358

    02/15/2022, 5:48 PM
    hi all, I'm trying to get an instance running via linux on a Chromebook, I'm getting the following error:
    Copy code
    (base) scottlam@penguin:~$ datahub docker quickstart
    Unable to run quickstart:
    - Docker doesn't seem to be running. Did you start it?
    (base) scottlam@penguin:~$
    l
    i
    +4
    • 7
    • 54
  • f

    fresh-river-19527

    02/15/2022, 7:18 PM
    Hi all, I'm giving a try to the tool again and it looks amazing now, congrats to the team. I was trying the new profiling feature for BigQuery, but I detected a couple of errors: 1. I'm getting this error for many of the table profile runs
    Profiling exception No BigQuery dataset specified. Use bigquery_temp_table batch_kwarg or a specify a default dataset in engine url
    . For some others it works. I'm of course passing the
    bigquery_temp_table_schema
    setting. I've tried with
    max_workers: 1
    but it still fails in some cases 2. The other one has been reported already. Looks like it is failing for columns with a REPEATED (ARRAY) type 3. A third one looks like it comes from GE itself. Looks like GE is not working with the GEOGRAPHY type
    d
    • 2
    • 18
  • w

    wooden-football-7175

    02/15/2022, 9:38 PM
    Ingest | Redshift-usage | Web UI 🧵 Error ingestion redshift-usage through web ui.
    b
    d
    • 3
    • 106
  • a

    alert-teacher-6920

    02/16/2022, 2:11 PM
    I was reading this thread: https://datahubspace.slack.com/archives/CV2UXSE9L/p1626400847242700, but I think the link posted at the top of the thread to a usage stats section on some page is broken. I’m wondering if using a Java emitter for a custom platform, if I can provide sample queries and usage stats? Or is that only something Snowflake and BigQuery support, and not something that can be done for custom platforms with custom emitter logic?
    d
    • 2
    • 4
  • c

    cool-painting-92220

    02/16/2022, 7:26 PM
    Hi there! I've taken a look around the community Slack and the DataHub documentation and didn't seem to find anything on it, so I wanted to check here - is there a good way of connecting to AzureML for metadata ingestion, or is that something projected to be added to the selection of compatible sources in the future?
    l
    • 2
    • 2
  • m

    modern-artist-55754

    02/16/2022, 11:20 PM
    Hi there, i am trying to understand if lineage is supported out of box for Athena i.e. without using the custom emitter?
    l
    • 2
    • 5
  • b

    bland-barista-59197

    02/17/2022, 12:19 AM
    Hi Team, I’m exploring Rest.li API and GraphQL API. How to secure these API’s?
    b
    • 2
    • 2
1...161718...119Latest