Hello, I have an error while ingesting from the Sn...
# ingestion
a
Hello, I have an error while ingesting from the Snowflake
Copy code
'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class '
                                      'com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" '
                                      'is invalid\n'
                                      '\n'
                                      '\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)',
                        'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: '
                                   '"Provided urn urn:li:corpuser:" is invalid\n',
                        'status': '422'}}],
I’m used the search here and found that I must use the transformers block What should I add?
Copy code
transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpuser" #like this?
b
needs to be
Copy code
transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpuser:your_user_id_here"
click on your user avatar in the web UI and see the URN in the URL if you're unsure
a
@better-orange-49102 Hi! Thanks for your reply! and what <your_user_id> should I use? Because in my receipt I have a specialIngestUser. Owner of the data seems like to be OneMoreAnotherUser. And I have myOwnSnowflakeUser. So there are 3 users. I thought I can set the data owned later, after ingesting
b
you could specify all 3 owners in the recipe... the issue is if you set additional owners via UI, and the next run of ingestion, it will override whatever values it has, unless the transformer queries for the information. (I don't have an example of that querying transformer right now though)
a
@better-orange-49102 I’v set the user id and I have the same errors
Copy code
[2022-09-06 12:29:01,890] ERROR    {datahub.ingestion.run.pipeline:53} -  failed to write record with workunit datasetUsageStatistics-1662336000000-for-urn:li:dataset:(urn:li:dataPlatform:snowflake,asdfasdf.asdasd.qerfqerfqref,DEV) with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)', 'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n', 'status': 422}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n\n\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)', 'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" is invalid\n', 'status': 422}


               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class '
                                      'com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: "Provided urn urn:li:corpuser:" '
                                      'is invalid\n'
                                      '\n'
                                      '\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)',
                        'message': 'Failed to validate record with class com.linkedin.dataset.DatasetUsageStatistics: ERROR :: /userCounts/0/user :: '
                                   '"Provided urn urn:li:corpuser:" is invalid\n',
                        'status': '422'}},
b
Copy code
Provided urn urn:li:corpuser:" is invalid\
how are you specifying the user in the recipe
it should look something like
urn:li:corpuser:datahub
a
Copy code
transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpuser:Username"
I have Okta connected to Datahub
so I have urnlicorpuser:Firstname.Lastname@myorg.com
b
i can't see why this is happening could you try writing it to file sink first and see what is being generated, and see if there is an ownership aspect with an invalid user urn
a
ok, I’ll do it with the file sink and come back
h
@ancient-apartment-23316 which source are you using
snowflake-usage
or
snowflake-beta
?
a
@hundreds-photographer-13496 snowflake-beta
I found this thing after ingesting to file sink
Copy code
{
    "entityType": "dataset",
    "entityUrn": "urn:li:dataset:(urn:li:oijopij:snowflake,kjnkjnkljnkjnlkj,DEV)",
    "changeType": "UPSERT",
    "aspectName": "ownership",
    "aspect": {
        "value": "{\"owners\": [{\"owner\": \"urn:li:corpuser:firstname.lastname@myorg.com\", \"type\": \"DATAOWNER\"}], \"lastModified\": {\"time\": 0, \"actor\": \"urn:li:corpuser:unknown\"}}",
        "contentType": "application/json"
    },
    "systemMetadata": {
        "lastObserved": 1662458866167,
        "runId": "snowflake-beta-2022_09_06-13_00_09"
    }
},
unknown - “lastModified\“: {\“time\“: 0, \“actor\“: \“urnlicorpuser:*unknown*\“}}”
is it a problem?
@better-orange-49102 @hundreds-photographer-13496
I have an error while ingesting to the datahub on the same table
Copy code
source:
  type: "snowflake-beta"
  config:
    env: DEV
    account_id: "qwe"
    warehouse: "asd"
    database_pattern:
      allow:
        - "zxc"
    schema_pattern:
      allow:
        - "www"

    username: "DATAHUB_USER"
    password: "qwerty"
    role: "DATAHUB_DEV"
transformers:
  - type: "simple_add_dataset_ownership"
    config:
      owner_urns:
        - "urn:li:corpuser:qwe.asd@sss.com"

sink:
  type: "datahub-rest"
  config:
    server: "<http://dkafjgnakjfngkajfgnakjfg.amazonaws.com:8080>"
here is my receipt
new error, it’s about actor
Copy code
{'error': 'Unable to emit metadata to DataHub GMS',
               'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
                        'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:422]: Failed to validate record with class '
                                      'com.linkedin.common.Operation: ERROR :: /actor :: "Provided urn urn:li:corpuser:" is invalid\n'
                                      '\n'
                                      '\tat com.linkedin.metadata.resources.entity.AspectResource.lambda$ingestProposal$3(AspectResource.java:142)',
                        'message': 'Failed to validate record with class com.linkedin.common.Operation: ERROR :: /actor :: "Provided urn '
                                   'urn:li:corpuser:" is invalid\n',
                        'status': '422'}},
h
did you make any changes to recipe ? or with same recipe
b
i wonder if urns accept a
@
symbol for experimentation, if you remove the
@sss.com
and ingest the file, would the error go away? you can rollback the ingestion later
h
From error log, it looks like the problem is not with
ownership
aspect, but with
datasetUsageStatistics
or
operation
aspect, for which transformer block may not work. @ancient-apartment-23316 - can you check if it works after specifying
email_domain: <http://sss.com|sss.com>
in recipe ?
Hey! could you try using file sink ? What does datasetUsageStatistics aspect look like ? Would you be able to share a sample from it that has urn as "urnlicorpuser:" ? https://datahubspace.slack.com/archives/CUMUWQU66/p1662457543565879?thread_ts=1662450421.081049&amp;cid=CUMUWQU66
Hey bumping again - did you get time to check this ?
a
Hi, I’m sorry, I disappeared for a while, this task was in the backlog, I will return to it as soon as I resolve issue with connecting Okta to the datahub