I am trying tableau ingestion. I'm getting the fol...
# troubleshoot
n
I am trying tableau ingestion. I'm getting the following error and I'm not sure why I'm getting this error. The following is the log of the result of ingestion with the debug option turned on. Hope you can help me what this log means.
datahub --debug ingestion -c tableau-recipe.yml
m
try to set
page_size: 1
. Tableau cannot return more than 20,000 nodes (i.e. piece of information)
n
@modern-artist-55754 Thank you for answer me! I set page_size to 1 to proceed with ingestion, and the following error was confirmed. In the case of NODE_LIMIT, it occurs in some cases, so it seems to be necessary to raise the upper limit in tableau separately. I wonder if there is a separate exception handling for the remaining parts.
Copy code
{'message': "
                                   '"Cannot return null for non-nullable type: \'TableauUser\' within parent \'PublishedDatasource\' '
                                   '(/publishedDatasourcesConnection/nodes[261]/owner)", \'path\': [\'publishedDatasourcesConnection\', \'nodes\', '
                                   "261, 'owner'], 'errorType': 'DataFetchingException', 'locations': None, 'extensions': None}

...

{'message': "
                                   '"Cannot return null for non-nullable type: \'RemoteType\' within parent \'Column\' '
                                   '(/publishedDatasourcesConnection/nodes[153]/upstreamTables[0]/columns[7]/remoteType)", \'path\': '
                                   "['publishedDatasourcesConnection', 'nodes', 153, 'upstreamTables', 0, 'columns', 7, 'remoteType'], 'errorType': "
                                   '\'DataFetchingException\', \'locations\': None, \'extensions\': None}

...

{'message': 'Showing partial results. The request exceeded the "
                                   "20000 node limit. Use pagination, additional filtering, or both in the query to adjust results.', 'extensions': "
                                   "{'severity': 'WARNING', 'code': 'NODE_LIMIT_EXCEEDED', 'properties': {'nodeLimit': 20000}}}

...

Cause: ERROR :: /upstreams/0/dataset :: "Provided urn '
                                      'urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.\'한게임,윈조이 채널$\',TEST)" is '
                                      'invalid: Failed to convert urn to entity key: urns parts and key fields do not have same length\n
m
@narrow-apple-60403 actually looks like your problem can be fully fixed in the upcoming release, @hundreds-photographer-13496 put up a PR to set page_size for PublishedDatasources and CustomSQLTables.
@narrow-apple-60403 you got 502 response from your tableau server, so likely that it’s temporarily down. You may want to retry later.
n
@modern-artist-55754 I wonder if the 502 error is related to TIME_LIMIT. And several error messages appear. I can't find an explanation as to why this error occurs. Is there a way?
Copy code
{
  "source": {
    "type": "tableau",
    "report": {
      "events_produced": "105273",
      "events_produced_per_sec": "85",
      "event_ids": [
        "urn:li:dataset:(urn:li:dataPlatform:redshift,boa.saycom.f_webgame_kpi,PROD)",
        "urn:li:dashboard:(tableau,359f9fb8-158f-a76b-073d-60a27d885785)",
        "urn:li:dataset:(urn:li:dataPlatform:greenplum,boa.saymob.v_mob_mkt_ecpi_retention_w,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:bigquery,nbs-bl-st.us_olap.f_bl_st_free_lumena_state_w,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:external,clipboard_20190703t134121leaf.clipboard_20190703t134121.txt,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_1wkjawr0ml70a316q8xl61edcgqv.url,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:redshift,boa.nwmob.v_brd_pne_terra_monitoring_d,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:external,superstore kr v201701.xlsx.주문,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:bigquery,solitaire-global.data_analytic.f_seven_data,PROD)",
        "urn:li:dataset:(urn:li:dataPlatform:bigquery,gostop2018-kr.gostop2018_info_analysis.f_smg_version_d,PROD)",
        "... sampled of 105273 total elements"
      ],
      "warnings": {},
      "failures": {},
      "start_time": "2022-09-07 09:52:24.288013",
      "running_time_in_seconds": "1238"
    }
  },
  "sink": {
    "type": "datahub-rest",
    "report": {
      "total_records_written": "112612",
      "records_written_per_second": "90",
      "warnings": [
        {
          "warning": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        }
      ],
      "failures": [
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        {
          "error": "Unable to emit metadata to DataHub GMS",
          "info": {
            "exceptionClass": "com.linkedin.restli.server.RestLiServiceException",
            "stackTrace": "com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)",
            "message": "java.lang.RuntimeException: java.lang.reflect.InvocationTargetException",
            "status": 500
          }
        },
        "... sampled of 221 total elements"
      ],
      "start_time": "2022-09-07 09:52:23.133451",
      "current_time": "2022-09-07 10:13:02.763866",
      "total_duration_in_seconds": "1239.63",
      "gms_version": "v0.8.44",
      "pending_requests": "0"
    }
  }
}
m
oh sorry my bad, i don’t think it’s tableau server, it’s your Datahub GMS that doesn’t response.
n
@modern-artist-55754 I'm looking at the log, but I think a comma (',') in the dataset name might cause an error. What do you think? 1.
Copy code
[2022-09-07 09:58:03,373] ERROR    {datahub.ingestion.run.pipeline:54} -  failed to write record with workunit tableau-urn:li:dataset:(urn:li:dataPlatform:tableau,1c8d4600-1139-58b1-8dcc-7ccd847b6c86,PROD)-upstreamLineage with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Invalid urn format for aspect: {upstreams=[{type=TRANSFORMED, auditStamp={actor=urn:li:corpuser:unknown, time=0}, dataset=urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.\'한게임,윈조이 채널$\',PROD)}]} for entity: urn:li:dataset:(urn:li:dataPlatform:tableau,1c8d4600-1139-58b1-8dcc-7ccd847b6c86,PROD)\n Cause: ERROR :: /upstreams/0/dataset :: "Provided urn urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.\'한게임,윈조이 채널$\',PROD)" is invalid: Failed to convert urn to entity key: urns parts and key fields do not have same length\n', 'message': 'Invalid urn format for aspect: {upstreams=[{type=TRANSFORMED, auditStamp={actor=urn:li:corpuser:unknown, time=0}, dataset=urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0w', 'status': 400}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Invalid urn format for aspect: {upstreams=[{type=TRANSFORMED, auditStamp={actor=urn:li:corpuser:unknown, time=0}, dataset=urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.\'한게임,윈조이 채널$\',PROD)}]} for entity: urn:li:dataset:(urn:li:dataPlatform:tableau,1c8d4600-1139-58b1-8dcc-7ccd847b6c86,PROD)\n Cause: ERROR :: /upstreams/0/dataset :: "Provided urn urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.\'한게임,윈조이 채널$\',PROD)" is invalid: Failed to convert urn to entity key: urns parts and key fields do not have same length\n', 'message': 'Invalid urn format for aspect: {upstreams=[{type=TRANSFORMED, auditStamp={actor=urn:li:corpuser:unknown, time=0}, dataset=urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0w', 'status': 400}
2.
Copy code
[2022-09-07 09:58:05,438] ERROR    {datahub.ingestion.run.pipeline:54} -  failed to write record with workunit urn:li:dataset:(urn:li:dataPlatform:google-sheets,temp_11mbwvr1miay631gnh47f0wthk33.'한게임,윈조이 채널$',PROD) with ('Unable to emit metadata to DataHub GMS', {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: java.lang.reflect.InvocationTargetException', 'status': 500}) and info {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException', 'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:500]: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:42)\n\tat com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:50)', 'message': 'java.lang.RuntimeException: java.lang.reflect.InvocationTargetException', 'status': 500}
m
ah right… i see. One solution you can do is to write a transformer that remove the
,
before it is sent to GMS.
,
is the reserved character in the
urn
n
@modern-artist-55754 Is there any transformer example for changing the dataset name?
m
and you can put in your recipes like
Copy code
transformers:
  - type: dnap_datahub_ingest.mce.transformers.sql_source.case_conversion.SqlSourceCaseTransformer
n
@modern-artist-55754 Oh, Thank you!! I love you😍 Then can I use that code as it is now in my case?
m
you have to change that a bit, because that transformer convert
urn
to lower or upper case, your have to update it so that it remove the
,
or
url_encode
it, there are few reserved character i think.
:
,
(
)
n
Ok, I'll try. Thank you.
@modern-artist-55754 Hi, May I know where
dnap_datahub_ingest
is?
m
@narrow-apple-60403 that’s our own package, you have to put in your package.
n
@modern-artist-55754 Thank you for answer me. I want to ask you one thing. I don't want to rename the dataset. You recommended
url_encode
, can I simply url encoding the name?
m
I think you can do it, i haven’t tried it yet. What i did is just simply replace
:
,
()
with empty string
n
@modern-artist-55754 Hi, I tried ingestion. But it doesn't work properly. Can you please check if my settings are correct?
Copy code
...
def url_encode(s: str) -> str:
    return parse.quote(s)

def replace_reserved(s: str) -> str:
    return re.sub('[,:()]', ' ', s)

__CASE_FUNCTIONS_DISPATCHER__ = {"url_encode": url_encode, "replace_reserved": replace_reserved}
...
Copy code
transformers:
  - type: "transformer.case_conversion_transformer.SqlSourceCaseTransformer"
    config:
      platforms: ['tableau']
      case: replace_reserved