https://datahubproject.io logo
Join Slack
Powered by
# ingestion
  • f

    flaky-soccer-57765

    08/31/2022, 8:16 AM
    image.png
  • m

    melodic-beach-18239

    09/05/2022, 3:39 AM
    Hey, an error when i ingest superset
  • g

    green-lion-58215

    09/06/2022, 4:36 PM
    reposting
  • a

    ancient-apartment-23316

    09/15/2022, 6:55 PM
    @hundreds-photographer-13496 Hi,
    did you make any changes to recipe ? or with same recipe
    the recipe looks exactly like my comment above, only I removed all my own names @better-orange-49102 Hi, I’v removed the
    @
    and\or
    @sss.com
    and have the same error
    Copy code
    'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
                   'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                            'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class '
                                          'com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" '
                                          'is invalid\n'
                                          '\n'
                                          '\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)',
                            'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: '
                                       '"Provided urn urn:li:corpuser:" is invalid\n',
                            'status': '422'}}],
    
    
    
    
    
    [2022-09-15 21:21:42,926] ERROR    {datahub.ingestion.run.pipeline:53} -  failed to write record with workunit datasetUsageStatistics-1663113600000-for-urn:li:dataset:(urn:li:dataPlatform:snowflake,qwer.eeee.sadfasdf,DEV) with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)', 'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n', 'status': 422}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)', 'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n', 'status': 422}
    @hundreds-photographer-13496 I’v used the
    email_domain
    and I have the same error
  • g

    green-lion-58215

    09/26/2022, 4:17 PM
    re-posting
  • c

    creamy-tent-10151

    09/27/2022, 5:09 PM
    Looking at athena.py it doesn't look like any coarse lineage is being built automatically
  • b

    bland-balloon-48379

    09/27/2022, 5:42 PM
    Hey, wanted to see if there were any updates for this thread. Currently running into this same issue for one of our databases. Also I tried setting the elasticsearch.yml file to give search.max_buckets a higher value in the helm chart for ES, but this causes the ES cluster to fail to start up all the way. Has anyone else run into this? We're using Datahub
    v0.8.44
    and ElasticSearch
    7.16.2
    .
  • a

    able-evening-90828

    09/27/2022, 11:37 PM
    Has anyone noticed scheduled UI ingestion not running as supposed to? I configured a few scheduled ingestions to run at the same time every day. I just noticed that all of them skipped 9/23.
  • o

    orange-flag-48535

    10/03/2022, 4:49 AM
    Bumping this up. This feature will simplify my Datahub integration task significantly if I can get it to work. Thanks.
  • c

    careful-engine-38533

    10/06/2022, 10:04 AM
    Hi, I see empty values in dataset, I have attached the screenshot of that - is there a way that I can clean them?
  • f

    future-hair-23690

    10/07/2022, 10:28 AM
    Hi guys, am experiencing the same issue as mentioned in the linked thread, namely my profiling does not start. I am using MSSQL(pyodbc) on cli version 0.8.45.2 Does anybody have an idea what might be wrong? There is no error or debug msg, just nothing happens related to profiling. My config:
    Copy code
    source:
      type: mssql
      config:
        password: ---------
        database: sandbox_validation
        host_port: 'az-uk-mssql-accept-01.logex.cloud:1433'
        username: ------
        use_odbc: 'true'
        uri_args:
            driver: 'ODBC Driver 17 for SQL Server'
            Encrypt: 'Yes'
            TrustServerCertificate: 'Yes'
            ssl: 'True'
        env: STG
        profiling:
          enabled: true
          limit: 10000
          report_dropped_profiles: false
          profile_table_level_only: false
    
          include_field_null_count: true
          include_field_min_value: true
          include_field_max_value: true
          include_field_mean_value: true
          include_field_median_value: true
          include_field_stddev_value: true
          include_field_quantiles: true
          include_field_distinct_value_frequencies: true
          include_field_sample_values: true
          turn_off_expensive_profiling_metrics: false
          include_field_histogram: true
          catch_exceptions: false
          max_workers: 4
          query_combiner_enabled: true
          max_number_of_fields_to_profile: 100
          profile_if_updated_since_days: null
          partition_profiling_enabled: false
        schema_pattern:
          deny:
            - DS\\oleksii
            - ds*
            - Logex*
          allow:
            - dbo.*
            - dbo
    cheers!
  • d

    dazzling-judge-80093

    10/28/2022, 1:26 PM
    maybe this is your issue? -> https://github.com/apache/airflow/issues/14896
  • d

    dazzling-judge-80093

    11/04/2022, 8:50 AM
    exactly, the profiler almost always is great expectation under the hood
  • n

    nutritious-salesclerk-57675

    11/07/2022, 3:30 AM
    Good day everyone. May I please follow up on this issue?
  • a

    ancient-policeman-73437

    11/11/2022, 5:19 PM
    Hi @dazzling-judge-80093, @plain-farmer-27314 @green-football-43791, could you please share if you were able to find solution for this issue ? I have 0.9.2 and face the same case. Only some chart - explore links are missed.
  • m

    microscopic-mechanic-13766

    11/14/2022, 9:11 AM
    And do you know by any chance the command or the format of the command to be able to build it?? I know the main classes are
    metadata-integration/java/spark-lineage/src/main/java/datahub/spark/DatahubSparkListener.java
    and
    metadata-integration/java/spark-lineage/src/main/java/datahub/spark/DatasetExtractor.java
    , but I have also seen some other dependencies that I have no idea where they come from, like
    darwin.x86_64
    Edit: I have also noticed that the code inside the classes mentioned (that are on the project's code) is not quite the same as the one that can be found inside the same classes that are in the jar. Do you also know the reason behind this?
  • d

    dazzling-judge-80093

    11/14/2022, 1:13 PM
    https://datahubproject.io/docs/metadata-ingestion/#basic-usage-of-cli-for-ingestion
    👍 1
  • r

    rich-state-73859

    11/16/2022, 6:57 PM
    I checked the deps in
    datahub-protobuf-0.9.2.jar
    and didn’t found
    datahub/shaded/org/apache/http/ssl/TrustStrategy
    , while it exists in
    datahub-protobuf-0.8.45.jar
    .
  • r

    rich-state-73859

    11/17/2022, 5:57 PM
    Any updates?thinking
  • a

    aloof-art-29270

    11/19/2022, 12:47 AM
    Hi, actually even I’m facing the same issue when using csv-enricher! Can someone help me with this error?
  • q

    quiet-school-18370

    11/22/2022, 8:14 AM
    I am creating the airflow DAG to ingest all the data of lookML repos, can anyone help me.
  • r

    refined-energy-76018

    11/22/2022, 10:03 AM
    @gray-shoe-75895 I've noticed recently that this doesn't seem to be working for the Datahub Airflow Plugin. Can you confirm? I'm running
    acryl-datahub-airflow-plugin=0.9.2.3
    and I have
    enabled
    set to
    false
    in both the .cfg file and running configuration but I'm still seeing metadata emitted to Datahub
  • l

    lemon-musician-50603

    11/22/2022, 6:23 PM
    Hi @gray-shoe-75895 sure Let me try this option first then
  • a

    astonishing-answer-96712

    11/23/2022, 5:51 PM
    Hi @full-chef-85630, could you elaborate on what you mean by “directly?” You mean within the DataHub telemetry/analytics tab, or to an external source?
  • a

    ancient-policeman-73437

    11/28/2022, 3:27 PM
    Hi @dazzling-judge-80093, dont you have any news here ?
  • f

    fresh-rocket-98009

    12/05/2022, 10:35 AM
    I tried to bash into the container terminal and run ingestion from it and it worked. But i’m receiving strange permission error from bigquery even though i granted all necessary permissions
    Copy code
    "Access Denied: Table TABLE.INFORMATION_SCHEMA.TABLES: User does not have permission to query TABLE.INFORMATION_SCHEMA.TABLES, or perhaps it does not exist in location US.
    Does anyone know how to fix this?
  • g

    great-account-95406

    12/08/2022, 5:53 AM
    Hi guys, I updated to
    v0.9.3
    and have the same problem. Could you help me what am I doing wrong? I'm using a Helm chart and just running a helm upgrade to
    0.2.116
    . Also I tried removing the release and deploying it from scratch but nothing changed. If I try to start the UI ingestion it becomes successful and then starts a new one with CLI prefix. I appreciate any help. Thanks!
  • p

    plain-cricket-83456

    12/14/2022, 9:53 AM
    Thank you for your reply, great! I am looking forward to seeing the function of testing the connection on the general database recently, and it is not very friendly to know whether the parameters are correct or not until the ingestion is executed. May I know if there are any plans to add this feature in the near future?
  • i

    incalculable-queen-1487

    12/15/2022, 2:59 AM
    Do you kown when plan it?
  • t

    tall-father-13753

    12/16/2022, 11:27 AM
    presto-on-hive works nicely, but I this way I can get only metadata, but we would like to have profiling data also. So, given that we are unable to change backticks behaviour in our hive, what other options are left for us? Manual profiling and ingestion using emitter SDK?
1...140141142143144Latest