important-afternoon-19755
05/12/2023, 2:56 PMDatasetUsageStatisticsClass
. I can see Queries tap and it works well in test (I emitted to about 30 urns.). But after I emit DatasetUsageStatisticsClass to about 4k urns, when I click data source, after about 10 seconds of loading and I got the error “An unknown error occurred. (code 500)” and the page looks like the picture I attached. Even for data sources where I don’t have the queries tab open.
Is there a limit to how many queries taps I can have open?
Or I set the max length of each query I emitted in the Queries tab to 10000, is there a limit to length of each query?lively-cat-88289
05/12/2023, 2:57 PMdelightful-ram-75848
05/16/2023, 2:15 AMimportant-afternoon-19755
05/16/2023, 2:23 AMdatahub docker check
, output is No issues detected
. Below is the log of datahub-gms.delightful-ram-75848
05/16/2023, 2:25 AM.","caused_by":{"type":"illegal_state_exception","reason":"unexpected docvalues type NONE for field 'topSqlQueries' (expected one of [SORTED, SORTED_SET]). Re-index with correct docvalues type."}}}}]},"status":500}
important-afternoon-19755
05/16/2023, 2:27 AMdef _emit_to_datahub(queries, query_count: int, db_name: str, tb_name: str):
if queries and db_name and tb_name:
top_sql_queries = [
trim_query(
query,
budget_per_query=10000,
)
for query in queries
]
usageStats = DatasetUsageStatisticsClass(
timestampMillis=get_sys_time(),
eventGranularity=TimeWindowSizeClass(unit=CalendarIntervalClass.DAY, multiple=1),
totalSqlQueries=query_count,
topSqlQueries=top_sql_queries,
)
mcp = MetadataChangeProposalWrapper(
entityType="dataset",
aspectName="datasetUsageStatistics",
changeType=ChangeTypeClass.UPSERT,
entityUrn=f'urn:li:dataset:(urn:li:dataPlatform:glue,{db_name}.{tb_name},PROD)',
aspect=usageStats,
)
# Emit metadata
emitter.emit(item=mcp)
delightful-ram-75848
05/16/2023, 2:55 AMimportant-afternoon-19755
05/16/2023, 6:47 AMdelightful-ram-75848
05/16/2023, 10:52 PMbland-lighter-26751
08/16/2023, 4:32 PMworried-laptop-98985
08/24/2023, 11:03 AMworried-laptop-98985
08/24/2023, 11:06 AM{"error":{"root_cause":[{"type":"exception","reason":"java.util.concurrent.ExecutionException: java.lang.IllegalStateException: unexpected docvalues type NONE for field 'topSqlQueries' (expected one of [SORTED, SORTED_SET])
important-afternoon-19755
08/24/2023, 11:08 AMimportant-afternoon-19755
08/24/2023, 11:09 AM2023-05-12 14:25:12,148 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener - Failed to feed bulk request. Number of events: 21 Took time ms: -1 Message: failure in bulk execution:
[7]: index [dataset_datasetusagestatisticsaspect_v1], type [_doc], id [25e23835a2de64beff172907fc73c967], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="topSqlQueries" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 34, 83, 69, 76, 69, 67, 84, 92, 110, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 116, 97]...', original message: bytes can be at most 32766 in length; got 33502]]; nested: ElasticsearchException[Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 in length; got 33502]];]
[8]: index [dataset_datasetusagestatisticsaspect_v1], type [_doc], id [5b70b999469bf8232731c1fb3a7e8a9d], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="topSqlQueries" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 34, 83, 69, 76, 69, 67, 84, 92, 110, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 116, 97]...', original message: bytes can be at most 32766 in length; got 51237]]; nested: ElasticsearchException[Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 in length; got 51237]];]
[9]: index [dataset_datasetusagestatisticsaspect_v1], type [_doc], id [5ab06f02d23bfc976a63bdccd605d465], message [ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Document contains at least one immense term in field="topSqlQueries" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[91, 34, 83, 69, 76, 69, 67, 84, 92, 110, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 116, 97]...', original message: bytes can be at most 32766 in length; got 56755]]; nested: ElasticsearchException[Elasticsearch exception [type=max_bytes_length_exceeded_exception, reason=bytes can be at most 32766 in length; got 56755]];]
important-afternoon-19755
08/24/2023, 11:18 AMDatasetUsageStatisticsClass(
timestampMillis=get_sys_time(),
eventGranularity=TimeWindowSizeClass(unit=CalendarIntervalClass.DAY, multiple=1),
totalSqlQueries=query_count,
topSqlQueries=top_sql_queries,
)
important-afternoon-19755
08/24/2023, 11:21 AM# top_sql_queries: List[str]
sum(len(top_sql_query.encode("utf-8")) for top_sql_query in top_sql_queries) < 32766
important-afternoon-19755
08/24/2023, 11:29 AMtopSqlQueries
is kept under 32766 characters.worried-laptop-98985
08/24/2023, 11:30 AMworried-laptop-98985
08/24/2023, 11:33 AMimportant-afternoon-19755
08/24/2023, 11:44 AMworried-laptop-98985
08/24/2023, 11:46 AMworried-laptop-98985
08/25/2023, 2:18 PMbland-lighter-26751
09/06/2023, 3:00 PMbland-lighter-26751
09/06/2023, 4:13 PMworried-laptop-98985
09/07/2023, 3:07 PM