https://datahubproject.io logo
Join Slack
Powered by
# troubleshoot
  • q

    quick-restaurant-75578

    07/29/2021, 5:10 AM
    What is the best recommendation to implement datahub in a PROD environment … Is docker compose the way - on a single AWS EC2 machine (if on AWS platform) ?
    m
    e
    s
    • 4
    • 4
  • s

    square-activity-64562

    07/29/2021, 8:14 AM
    I am trying to add a new source https://github.com/linkedin/datahub/pull/2985 but there are some error happening. Unit tests passed. I have followed the doc https://datahubproject.io/docs/metadata-ingestion/adding-source but not sure why the error is happening. Can anyone please check what I am missing?
    • 1
    • 1
  • s

    square-activity-64562

    07/29/2021, 9:29 AM
    How do I remove datasets that I accidentally imported in datahub so they don't show up in the UI or search results?
    q
    e
    b
    • 4
    • 4
  • s

    square-activity-64562

    07/29/2021, 11:23 AM
    any ideas what the mapping should be for these data type errors during ingestion from a postgres database?
    Copy code
    - unable to map type Geography(geometry_type='POINT', srid=4326, from_text='ST_GeogFromText'
    - unable to map type UUID() to metadata schema
    - unable to map type OID() to metadata schema
    - unable to map type Geography(geometry_type='MULTIPOLYGON', srid=4326, from_text='ST_GeogFromText', name='geography') to metadata schema
    - unable to map type Geometry(from_text='ST_GeomFromEWKT', name='geometry') to metadata schema
    - unable to map type INET() to metadata schema
    g
    • 2
    • 6
  • p

    polite-flower-25924

    07/29/2021, 9:28 PM
    Here is a noob question. How can I add a user to DataHub? I installed DataHub in my kubernetes cluster, and joined to DataHub web application with default username/password. However, I didn’t see any admin console or something like that in order to add/list/remove users.
    b
    g
    • 3
    • 47
  • s

    square-activity-64562

    07/30/2021, 7:36 AM
    anyone using datahub lineage with Airflow in Google Composer? I have installed
    acryl-datahub[airflow]==0.8.6.4
    as a package but it is not showing connection type in airflow UI
    l
    b
    +2
    • 5
    • 30
  • s

    square-activity-64562

    08/03/2021, 8:35 AM
    I was playing around with lineage. Now I ended up with this graph. I would like to remove the dataset to dataset lineage. I am not sure how to do that. UpstreamLineageClass does not seem to have a removed status. https://github.com/linkedin/datahub/blob/352a0abf8d7e4dd5d5664a8c7cdf3d77bf6f1c51/metadata-ingestion/src/datahub/metadata/schema_classes.py#L3274
    b
    s
    +3
    • 6
    • 13
  • s

    square-activity-64562

    08/04/2021, 4:52 AM
    v0.8.7
    Copy code
    ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.NullPointerException: Cannot set field lastObserved of com.linkedin.mxe.SystemMetadata to null
    This field should be optional as per this file https://github.com/linkedin/datahub/blob/aa253f5b3b6c92dc919a0037008ec54c23a50a95/[…]ata-models/src/main/pegasus/com/linkedin/mxe/SystemMetadata.pdl Am I incorrect?
    m
    g
    • 3
    • 13
  • s

    square-activity-64562

    08/04/2021, 7:11 AM
    v0.8.7 profiling feature We have some tables with 0 rows. With all table's profiling enabled we get errors
    m
    • 2
    • 4
  • s

    square-activity-64562

    08/04/2021, 2:18 PM
    This happened again today https://datahubspace.slack.com/archives/CV2UXSE9L/p1627363984297000
    b
    c
    • 3
    • 29
  • s

    square-activity-64562

    08/05/2021, 9:14 AM
    I am trying to make some change in a branch. I was running these commands https://datahubproject.io/docs/metadata-ingestion/developing#testing
    Copy code
    metadata-ingestion ❯ mypy .
    
    setup.py: error: Duplicate module named "setup" (also at "./examples/transforms/setup.py")
    setup.py: note: Are you missing an __init__.py? Alternatively, consider using --exclude to avoid checking one of them.
    Found 1 error in 1 file (errors prevented further checking)
    Are these commands out of date? Is there a way to filter this folder out?
    m
    c
    • 3
    • 5
  • m

    microscopic-musician-99632

    08/05/2021, 11:08 AM
    In the process of trying out datahub following https://datahubproject.io/docs/quickstart I seem to be encountering the issue in https://github.com/linkedin/datahub/issues/3023 . Any suggestions would be helpful. regards.
    m
    • 2
    • 1
  • s

    square-activity-64562

    08/05/2021, 11:56 AM
    I ran the profiler on a table with 8M rows in postgres and I got these errors from great expectations for 12 out of 14 columns.
    Copy code
    ERROR    {great_expectations.profile.basic_dataset_profiler:87} - Failed to get cardinality of column COLUMN_NAME - continuing...
    • 1
    • 1
  • g

    gorgeous-fountain-3070

    08/05/2021, 12:57 PM
    Hi, here is a deployment question I can't seem to figure out. I followed the steps to deploy datahub in my kubernetes cluster. Everything has successfully loaded except for datahub-mae-consumer whose status shows error. Any ideas on how to fix it?
    g
    e
    • 3
    • 10
  • h

    handsome-football-66174

    08/06/2021, 3:58 PM
    Hi, Trying to execute datahub docker on AWS - EC2 instance. And when I try to ingest mysql_to_datahub recipe, I get the following-
    e
    m
    b
    • 4
    • 37
  • o

    orange-airplane-6566

    08/06/2021, 10:26 PM
    I'm working on upgrading from DataHub 0.8.1 to 0.8.8. As part of that upgrade, I'd like to drop Neo4j. I'm using the helm chart at https://github.com/acryldata/datahub-helm. Is it correct to assume that if I want lineage to work after running this upgrade, I'll need to run the
    RestoreIndices
    upgrade manually after
    helm upgrade
    ? (details in thread)
    g
    • 2
    • 8
  • b

    bland-easter-53873

    08/09/2021, 7:19 AM
    Hi, I am trying to connect with snowflake with datahub. The default (trial snowflake) is working fine. But when I connect to my project instance, it is not scanning any tables/views in the databases. Am I missing any configuration pattern
    l
    c
    m
    • 4
    • 14
  • g

    gentle-father-80172

    08/09/2021, 7:09 PM
    Hi everyone! Trying a FRESH build from source and getting
    ElasticSearchGraphServiceTest
    errors. I'm on mac os 11.5.1. Any ideas?
    g
    e
    c
    • 4
    • 21
  • f

    faint-painting-38451

    08/09/2021, 7:39 PM
    Hi, I have Datahub setup locally, but I noticed that the .pdl files aren't being recognized by IntelliJ. I was looking through the docs for a plugin but couldn't find any links to one though it seemed like one exists, does anyone have a link to that? Also, I did setup the project using the generated .ipr file, so I think the setup was correct.
    g
    b
    • 3
    • 3
  • o

    orange-airplane-6566

    08/10/2021, 4:15 PM
    I'm working on upgrading from 0.8.1 to 0.8.8 today, using the helm chart at https://github.com/acryldata/datahub-helm. I was surprised to learn this morning that file-based basic authentication is now the default behavior for the DataHub frontend (https://github.com/linkedin/datahub/pull/2818). For version 0.8.8, is it possible to disable auth entirely in
    datahub-frontend
    ?
    m
    b
    p
    • 4
    • 16
  • b

    billions-tent-29367

    08/10/2021, 8:46 PM
    Hello! I'm working on extending the metadata models, but I'm having an issue trying to load data. I created .pdl files in
    metadata-models-ext/com/example/
    for my model's key, entity and snapshot.
    ./gradelw build
    picks up the models and generates code. When I create a work unit and attempt ingestion using
    datahub ingest -c
    , I get the error
    AvroException: ('Datum union type not in schema: %s', 'com.example.TeamSnapshot')
    . Any tips on what this error actually means?
    b
    • 2
    • 21
  • s

    square-activity-64562

    08/12/2021, 11:01 AM
    In the stats tab for some datasets Viewing profiling history for the past 1 day shows 2 runs that I did. But for 1 week does not show anything and it is empty. Not sure what is happening here
    l
    b
    • 3
    • 5
  • s

    square-activity-64562

    08/13/2021, 6:55 AM
    When ingesting 
    glue
     source is the partition key not ingested? I just noticed partition key missing from datahub schema. And there were no errors/warnings related to it during ingestion either
    m
    • 2
    • 4
  • s

    square-activity-64562

    08/13/2021, 12:20 PM
    I was looking at the results of this query and I noticed
    urn:li:principal:UNKNOWN
    is
    createdby
    for all rows according to this table.
    Copy code
    SELECT
      createdby,
      count(*)
    from metadata_aspect_v2
    group BY
      createdby
    When people are logged into datahub and they are making edits shouldn't their urn be present here? Or am I misunderstanding this schema?
    g
    • 2
    • 1
  • b

    bland-orange-95847

    08/16/2021, 9:06 AM
    Hi, I am manually ingesting data with SchemaMetadata and when I try to fetch it I get
    Exception while fetching data (/browse/entities) : java.lang.RuntimeException: Failed to retrieve entities of type Dataset
    on the frontend (GMS Stacktrace in Thread) I create the data similar to glue source connector but unsure which field is NULL and how to debug it. If I push the record without SchemaMetadata it works so something gets wrong with that… Maybe someone can help me to move forward identifying the issue? Thanks in advance 🙂
    l
    b
    m
    • 4
    • 14
  • a

    able-park-49455

    08/16/2021, 9:09 AM
    Hi all, how can I print debug level logs on datahub ingestion container? Should I set an env or pass an argument? Thanks
    l
    • 2
    • 2
  • a

    adamant-city-60979

    08/16/2021, 3:22 PM
    Hi Folks, am trying to ingest tables metadata from Athena, it has loaded first DB’s tables, but the tables from the DB i specified in the Yaml file did not get load up to Datahub. Its’s showing me the below error while trying to load
    l
    b
    • 3
    • 33
  • w

    witty-actor-87329

    08/17/2021, 5:01 PM
    Hi All, We are trying to ingest metadata from AWS glue catalog, looker and lookml files & build lineage out of datahub. But datahub is able to pickup lineage information only from looker sources and its not linking the lineage information to glue source. Is there something that we can do to link these two to make the lineage complete?
    m
    s
    • 3
    • 9
  • s

    square-activity-64562

    08/18/2021, 4:26 AM
    If the UI starts to keep on loading and on the network tab it is
    graphql
    which is taking time then it means either
    gms
    or
    elasticsearch
    is taking time, correct?
    m
    • 2
    • 5
  • m

    modern-nail-74015

    08/18/2021, 5:34 AM
    Hi, I use programatic_pipeline file to ingest data, but error occurred
    Copy code
    failed to write record with workunit information_schema.ST_SPATIAL_REFERENCE_SYSTEMS with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: ERROR :: /value/com.linkedin.metadata.snapshot.DatasetSnapshot/aspects/1/com.linkedin.schema.SchemaMetadata/fields/0/isPartOfKey :: unrecognized field found but not allowed\nERROR ::
    m
    • 2
    • 9
12345...119Latest