https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • a

    adamant-pharmacist-61996

    08/18/2021, 8:32 AM
    Hi All, We’re just trying to configure our datahub instance to work with our production kafka cluster, and we’re seeing some configuration issues when we try to change the topic names so they fit with our naming convention. Specifically changes to the topic
    MetadataChangeProposal_v1
    don’t seem to be respected in the gms process. Is this a known problem? I’ve tried poking through the code but can’t immediately see the source
    e
    • 2
    • 10
  • c

    curved-jordan-15657

    08/19/2021, 1:47 PM
    Hello! I’m trying to enable sql profiling for our redshift cluster but i’m having an issue. Has anyone encountered such an error?
    Copy code
    ProgrammingError: (psycopg2.errors.InsufficientPrivilege) permission denied for relation campaign_retention_order_segment
    
    [SQL: CREATE TEMPORARY TABLE "ge_tmp_61606ece" AS SELECT * 
    FROM dev.campaign_retention_order_segment 
     LIMIT 10]
    • 1
    • 1
  • b

    blue-megabyte-68048

    08/20/2021, 1:58 PM
    I'm trying to get datahub v0.8.10 talking to Elastic via SSL, but it doesn't want to pick up the certs I've configured. I've got
    ELASTICSEARCH_USE_SSL=true
    as well as all the various
    ELASTICSEARCH_SSL_*
    env vars specified by the docker env file. Any suggestions or troubleshooting I can do? I have confirmed that the certs work to connect to ES.
    m
    e
    b
    • 4
    • 29
  • h

    handsome-football-66174

    08/20/2021, 5:38 PM
    Getting the following error when trying to set up airflow with datahub: Following - https://datahubproject.io/docs/metadata-ingestion/#lineage-with-airflow
    m
    • 2
    • 10
  • r

    rhythmic-london-44496

    08/23/2021, 10:59 AM
    Hey, I am trying to run the latest datahub stack (
    v0.8.10
    ) on Kubernetes using https://github.com/acryldata/datahub-helm/tree/master/charts/datahub But both
    GMS
    and
    upgrade-job
    pods have troubles running -> they both log errors showing that the main process cannot find
    entity-registry.yml
    file, e.g.:
    Copy code
    10:46:49.006 [main] ERROR o.s.web.context.ContextLoader - Context initialization failed
    org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'metadataAuditEventsProcessor' defined in URL [jar:file:/tmp/jetty-0_0_0_0-8080-war_war-_-any-1391980781119123614.dir/webapp/WEB-INF/lib/mae-consumer.jar!/com/linkedin/metadata/kafka/MetadataAuditEventsProcessor.class]: Uns
    atisfied dependency expressed through constructor parameter 1; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'searchServiceFactory': Unsatisfied dependency expressed through field 'elasticSearchService'; nested exception is org.springframework.beans
    .factory.UnsatisfiedDependencyException: Error creating bean with name 'elasticSearchServiceFactory': Unsatisfied dependency expressed through field 'entityRegistry'; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'entityRegistryFactory': Unsatisfied
     dependency expressed through field 'configEntityRegistry'; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'configEntityRegistry' defined in com.linkedin.gms.factory.entityregistry.ConfigEntityRegistryFactory: Bean instantiation via factory method failed; nes
    ted exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.models.registry.ConfigEntityRegistry]: Factory method 'getInstance' threw exception; nested exception is java.io.FileNotFoundException: ../../metadata-models/src/main/resources/entity-registry.yml (No su
    ch file or directory)
            at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:798)
    (...)
    Caused by: 
    org.springframework.beans.BeanInstantiationException: Failed to instantiate [com.linkedin.metadata.models.registry.ConfigEntityRegistry]: Factory method 'getInstance' threw exception; nested exception is java.io.FileNotFoundException: ../../metadata-models/src/main/resources/entity-registry.yml (No such file or direct
    ory)
    Are published images broken and we should maintain our own builds?
    p
    e
    • 3
    • 11
  • h

    handsome-belgium-11927

    08/24/2021, 11:16 AM
    Hi, I'm trying to run the latest version of datahub (pulled right now), looks like
    python -m datahub docker quickstart
    is no longer working, help says that there is no "docker" command anymore. Though running docker/quickstart.sh works well. Is it a bug? Docs at https://datahubproject.io/docs/quickstart have not been modified yet.
    m
    • 2
    • 3
  • a

    adamant-pharmacist-61996

    08/24/2021, 11:10 PM
    hi there, I’m just looking at adding profiling stages to some of our metadata ingestion and some of the calcualtion are pretty expensive and might not add too much value for us. Is there a way to configure which profiling steps will be applied?
    l
    • 2
    • 4
  • b

    bland-salesmen-77140

    08/25/2021, 8:25 AM
    Hi, I’m looking for most simplistic way to create Business Glossary for Datahub POC, I wat thinking about using some api call but in documentation I’m missing example for creating of Business sines Glossary Term like for example for datasets, groups and so on: https://github.com/linkedin/datahub/blob/2c5edd88abfeafa4400b2601f7debc5cde5a1bfb/metadata-service/README.md, do you know guys if this is an option for now?
    ➕ 1
    ❤️ 1
    • 1
    • 1
  • s

    square-activity-64562

    08/25/2021, 11:54 AM
    On v0.8.11 on the group pages I am not seeing pagination for datasets. Also, members are not being shown. Groups were not appearing in search results either. I did a indices restoration and groups started appearing in search results. But even after indices restoration the pagination is not showing up for datasets and members list is empty. I checked in the database using
    Copy code
    SELECT * from metadata_aspect_v2 
    where aspect = 'corpGroupInfo'
    members are present for the groups. But they are not shown on the groups page
    b
    • 2
    • 9
  • s

    square-activity-64562

    08/25/2021, 5:18 PM
    I tried to create a dashboard in superset to monitor what all tags are present in datahub
    Copy code
    SELECT urn AS urn,
           count(*) AS count
    FROM
      (SELECT urn
       from metadata_aspect_v2
       where aspect = 'tagKey') AS expr_qry
    GROUP BY urn
    ORDER BY count DESC
    But it is missing some tags which I know are applied on datasets. What am I doing wrong?
    m
    • 2
    • 10
  • w

    witty-actor-87329

    08/25/2021, 8:21 PM
    Hi, trying to run no code upgrade following this doc. Getting an below error when running Step 1:
    Copy code
    Removing network datahub_network
    ERROR: error while removing network: network datahub_network id c2c6739a3ce536f9d1b091c0ba24df7d1584dbeb20dc1cd0b913d3 has active endpoints
    should anything be done before running the command. Doing this EC2. Thanks
    m
    e
    • 3
    • 6
  • c

    clever-river-85776

    08/26/2021, 8:19 AM
    Hi. I just downloaded and started the quickstart for the first time, and I'm having trouble browsing the API docs. Is the graphql schema exposed on an endpoint? The docs suggest graphQl is running on :8091, bit that gives a 404. I see from the UI it's sending graphQL requests to
    :9002/api/v2/graphql
    . I tried
    :9002/api/v2/graphql/schema
    , (just guessing), but that didn't work.
    b
    • 2
    • 12
  • s

    square-activity-64562

    08/26/2021, 8:23 AM
    Bug in Lineages v0.8.11 I have a dataset which is the output of a dataJobInputOutput as shown in screenshot 1. This dataset has 6 views on top of it which I have tried to add using upstreamLineage aspect as shown in screenshot two as per the example at https://github.com/linkedin/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_rest.py. On the UI it shows 1 upstream to the airflow task as expected. But on the downstream it is showing a single downstream dependency. It should be showing 6 as per the
    upstreamLineage
    aspects in which it is present in the database.
    b
    • 2
    • 1
  • s

    square-activity-64562

    08/26/2021, 9:45 AM
    Bug in the UI on dataset schema page v0.8.11. Any field which has another field as prefix or suffix is being show as part of the field with shorter name. These are 2 different fields and should shown separately as this is misleading.
    h
    b
    +2
    • 5
    • 13
  • w

    wonderful-quill-11255

    08/26/2021, 2:15 PM
    Hi. I'm trying to upgrade our datahub but I'm seeing an error that the gms doesn't find an elasticsearch index. Previously we have run the
    docker/elasticsearch-setup/create-indices.sh
    script to update indices but after 2840 that script only creates the usage analytics indices. I'm wondering, how are the indices created these days? @early-lamp-41924 Perhaps you know?
    e
    • 2
    • 3
  • f

    fresh-carpet-31048

    08/26/2021, 2:18 PM
    Good morning! I wanted to ask a couple clarifying questions -- does any curl call to an ingestion method require a snapshot? Are all pdls representative of an entity? And also a follow up question if pdls do represent entities -- I thought I was told once that a urn isn't necessary for pdls, but based on documentation it seems that all entities need a urn, which makes me think that pdls should all have a urn as well, right?
    b
    • 2
    • 2
  • h

    handsome-belgium-11927

    08/27/2021, 2:26 PM
    Hello everyone! Is there any information on how to ingest DatasetProfileClass? Previously I was ingesting everything using information from schema_classes.py, but I can't see a way to ingest Profiles.
    m
    b
    h
    • 4
    • 5
  • g

    gentle-father-80172

    08/27/2021, 7:56 PM
    Hi I am having issues integrating Datahub with my company's Okta deployment. I am currently getting this error when I try to hit the homepage.
    Copy code
    Identity Provider: Unknown
    Error Code: invalid_request
    Description: The 'redirect_uri' parameter must be a Login redirect URI in the client app settings: <https://admin.settings/example>
    Not sure how to troubleshoot this... My IT department says they don't see any Okta requests from Datahub. No logs of my request are appearing in
    docker logs -f datahub-frontend-react
    either... Config in thread below:
    l
    b
    m
    • 4
    • 14
  • s

    some-microphone-33485

    08/30/2021, 3:20 AM
    Hello Team . We have an issue with the datahub instance that is configured in EKS . The instance only shows the default image of the datasets . The dashboards and charts are showing the logos correctly . For example it shows like this insted of the logo of redshift and looker . In my local docker environment it is showing all the images . Not sure if anyone faced this issue .
    m
    b
    • 3
    • 5
  • s

    some-microphone-33485

    08/30/2021, 6:51 PM
    Hello Team , Back with another question. https://github.com/acryldata/datahub-helm/issues/25 Do we have any workaround for this issue that is raised ? we are now currently stuck with SSO implementation for this issue . cc : @crooked-market-47728
    e
    c
    +3
    • 6
    • 33
  • f

    fresh-carpet-31048

    08/30/2021, 8:27 PM
    hey! was wondering if anyone has advice for handling incompatible snapshot changes? I tried these methods for a quick fix but neither worked out for me.
    g
    • 2
    • 2
  • a

    adamant-pharmacist-61996

    08/30/2021, 11:30 PM
    hey everyone - potentially a dumb question.. whats the best way to set the logging levels in the ui and gms containers? I’d like to be able to set the level to debug
    b
    r
    • 3
    • 20
  • h

    handsome-football-66174

    08/31/2021, 3:20 PM
    Facing this issue - Using docker-compose to run Datahub. With a few modifications to the GMS connection details. Also how do ingest the metadata ? (the docker containers are running)
    e
    • 2
    • 8
  • c

    curved-jordan-15657

    08/31/2021, 9:02 PM
    Hi guys! I’m trying to do sql profiling for redshift. First i’ve tried it with limit:10 and it was ok. But then i’ve tried it without limit:10 and i had an error for specific table which has 25m row and 370 column. The error is:
    Copy code
    KafkaException: KafkaError{code=MSG_SIZE_TOO_LARGE,val=10,str="Unable to produce message: Broker: Message size too large"}
    I did some research about it and people say that i need to increase some of the properties like message.max.byte, max.request.byte etc. from broker,producer and consumer sides. I updated the server.properties,consumer.properties and producer.properties files inside the k8s kafka pod, but i couldn’t solve the issue. Can anybody help me about kafka and k8s? Note: I think i need to restart the kafka broker to apply server.properties changes somehow, but i don’t know how.
    e
    h
    • 3
    • 4
  • h

    high-hospital-85984

    09/03/2021, 6:38 AM
    While trying to migrate the graph backend to ES bu running the restore indices job I get hit with this error:
    e
    • 2
    • 7
  • s

    square-activity-64562

    09/07/2021, 7:02 AM
    I was reading through https://datahubproject.io/docs/policies/ and noticed it does not mentioned where we can set the default user with admin rights in case we are using OIDC. What property can be used to set the user when using OIDC?
    b
    • 2
    • 11
  • s

    square-activity-64562

    09/07/2021, 7:06 AM
    When the policies are rolled out I think we would need to use tokens to access GMS. I could not find any docs about generating a token. Can someone please point me to the correct doc? Just wondering what identity would this token used for automated ingestion be associated with. Usually we use service accounts but we don't have service accounts in datahub.
    • 1
    • 1
  • c

    calm-sunset-28996

    09/07/2021, 8:54 AM
    Hey all, I'm having a failure on Datahub. It's shown like the attached picture in the UI. When I check the logs it says the following:
    Copy code
    exception: java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to retrieve entities of type Dataset
    Copy code
    Caused by: java.lang.RuntimeException: Failed to batch load Datasets
    Copy code
    Caused by: com.linkedin.r2.RemoteInvocationException: com.linkedin.r2.RemoteInvocationException: Failed to get response from server for URI <https://datahub-gms.net:443/entities>
    	at com.linkedin.restli.internal.client.ExceptionUtil.wrapThrowable(ExceptionUtil.java:135)
    Copy code
    Caused by: io.netty.handler.codec.TooLongFrameException: Response entity too large: HttpObjectAggregator$AggregatedFullHttpResponse(decodeResult: success, version: HTTP/1.1, content: CompositeByteBuf(ridx: 0, widx: 2096929, cap: 2096929, components=335))
    So the entities are too large, causing. the lookup to fail. To give a bit of context: this is only happening with really specific searches, where it has to retrieve multiple datasets which have a huge amount of columns (1000k+). Then it times out. If I search for these individual entities it's fine, the same when I go to their respective pages. Any idea on how to fix this? I'm looking for some netty settings atm like
    maxResponseKB
    which I could potentially set. It's probably the same error as https://github.com/linkedin/datahub/issues/3106
    m
    g
    • 3
    • 7
  • s

    square-activity-64562

    09/07/2021, 9:52 AM
    I tried using the delete API to delete users I had created earlier before the delete API was released as per https://datahubproject.io/docs/how/delete-metadata/#delete-by-urn
    Copy code
    $ datahub init
    Configure which datahub instance to connect to
    Enter your DataHub host [<http://localhost:8080>]: <http://datahub-datahub-gms.apps.svc.cluster.local:8080>
    Enter your DataHub access token (Supports env vars via `{VAR_NAME}` syntax) []:
    Written to /home/datahub/.datahubenv
    $ datahub delete --urn "urn:li:corpuser:aseem.bansal"
    This will permanently delete data from DataHub. Do you want to continue? [y/N]: y
    Successfully deleted urn:li:corpuser:aseem.bansal. 0 rows deleted
    It says
    Copy code
    0 rows deleted
    I have noticed that delete API works for anything that was created after the delete API was released but does not seem to work for things created before that version
    g
    h
    • 3
    • 10
  • h

    high-hospital-85984

    09/07/2021, 10:10 AM
    FYI, I'm hitting
    error in Flask-OpenID setup command: use_2to3 is invalid.
    when installing dev dependencies for metadata-ingestion. Most likely cause is this: https://setuptools.readthedocs.io/en/stable/history.html#breaking-changes
    m
    • 2
    • 10
12345...119Latest