hallowed-machine-2603
06/06/2022, 7:35 AMrich-policeman-92383
06/06/2022, 8:55 AMchilly-elephant-51826
06/06/2022, 9:39 AMbumpy-activity-74405
06/06/2022, 9:53 AMakka.actor.ActorSystemImpl - Illegal request, responding with status '431 Request Header Fields Too Large': HTTP header value exceeds the configured limit of 8192 characters
Not sure how to test this, but I think this PR should enable users to solve the issue.swift-breakfast-25077
06/06/2022, 10:40 AMDATAHUB_DEBUG
to true
in order to enable debug logging for DataHubValidationAction
??high-hospital-85984
06/06/2022, 10:58 AMthankful-magazine-50386
06/06/2022, 11:30 AMcurl -X GET --location "<http://192.168.25.133:9200/_cat/indices?v=&pretty=>"
And here is the result.
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open dataset_datasetprofileaspect_v1 hv4oWE6YSUSJdbcMBLU_DA 1 1 0 0 208b 208b
yellow open datajobindex_v2 1S-T1Y5jQziyqQJe61DYGA 1 1 0 0 208b 208b
yellow open datahubexecutionrequestindex_v2 QIRguaTMRnCrR1FnjLt_pg 1 1 0 0 208b 208b
yellow open datahubsecretindex_v2 7ifKdzCVRfSY5fVgDyGlwA 1 1 0 0 208b 208b
yellow open mlmodelindex_v2 JQOJjEpzSUWka9Cz4HNdRA 1 1 0 0 208b 208b
yellow open dataflowindex_v2 TsEzAXk8RNK4i4EaPwpXsw 1 1 0 0 208b 208b
yellow open mlmodelgroupindex_v2 TsxHmhO6SDauKjuLAt5LkA 1 1 0 0 208b 208b
yellow open datahubpolicyindex_v2 V4vqL__2RTuYWpZUeyjVxg 1 1 5 0 10.9kb 10.9kb
yellow open assertionindex_v2 oe7-AhYMTZyYYCHAwzzLuQ 1 1 0 0 208b 208b
yellow open corpuserindex_v2 ERzx5nyjSRq9rStuV8v4JA 1 1 0 0 208b 208b
yellow open dataprocessindex_v2 jM3LMbXoTl-nCNeArvZyTA 1 1 0 0 208b 208b
yellow open chartindex_v2 1NC6UgJBSBWzRG-K7MdpOA 1 1 0 0 208b 208b
yellow open tagindex_v2 kCEw1rUhTN2tn-FQG4_7Ng 1 1 0 0 208b 208b
yellow open mlmodeldeploymentindex_v2 KaMP2khtSRKk3wCt2gxp-Q 1 1 0 0 208b 208b
yellow open datajob_datahubingestioncheckpointaspect_v1 1majN12nTCqADYYm8L62wQ 1 1 0 0 208b 208b
yellow open dataplatforminstanceindex_v2 jhg10SLCSUa6YUaDRJLBow 1 1 0 0 208b 208b
yellow open dashboardindex_v2 msE7SGasQgmmedPIb2ZBMw 1 1 0 0 208b 208b
yellow open assertion_assertionruneventaspect_v1 0LUTkuT8Rc6Cwcn1cqWPrw 1 1 0 0 208b 208b
yellow open telemetryindex_v2 7ZPOU6smSvGNuj_Wk4c2NA 1 1 0 0 208b 208b
yellow open datasetindex_v2 c68bjkNARQGah2SPO8qfXQ 1 1 109 2 205.9kb 205.9kb
yellow open mlfeatureindex_v2 Bx-cGds6S--LP7iHv9lYRA 1 1 0 0 208b 208b
yellow open datajob_datahubingestionrunsummaryaspect_v1 8EQMHCrCQTmw7Ez3qW9w_w 1 1 0 0 208b 208b
yellow open dataplatformindex_v2 fvzlZxDATlK22B_1wQJBCw 1 1 0 0 208b 208b
yellow open dataprocessinstanceindex_v2 GWILAn0GSCCsiYAeXsn27w 1 1 0 0 208b 208b
yellow open glossarynodeindex_v2 9ip6m8sjQtuZh38eFGZriQ 1 1 0 0 208b 208b
yellow open datahubingestionsourceindex_v2 ys5TcFZNQIeCmcAX8V0HLQ 1 1 0 0 208b 208b
yellow open datahubretentionindex_v2 9g_c0oXaQ6CivDjVccoWdA 1 1 0 0 208b 208b
yellow open graph_service_v1 sVrfS2KkQ5yRxnafSMPMzQ 1 1 112 0 28kb 28kb
yellow open dataprocessinstance_dataprocessinstanceruneventaspect_v1 NuJ6YSr8Re22tq6NFF_dlQ 1 1 0 0 208b 208b
yellow open system_metadata_service_v1 jRmFnMePTnCgWYk6hthcIQ 1 1 908 5 102.4kb 102.4kb
yellow open dataset_operationaspect_v1 OVTWSBX6Q7eTwJhmZ-H1jA 1 1 0 0 208b 208b
yellow open datahubaccesstokenindex_v2 Tq_HSa-3QnePbcEa0YKmvQ 1 1 0 0 208b 208b
yellow open containerindex_v2 gMq55jCbSmCsgxdcLfl8qQ 1 1 4 0 11.5kb 11.5kb
yellow open schemafieldindex_v2 QN2r3l3xROO-1Cvr_pIG-w 1 1 0 0 208b 208b
yellow open domainindex_v2 KMF7IFgeTvGZSAmymuOpmA 1 1 0 0 208b 208b
yellow open testindex_v2 oosgqWP4SI6xg3mhzkLJcQ 1 1 0 0 208b 208b
yellow open mlfeaturetableindex_v2 8CRhft62SyyqJlTHFEe0GA 1 1 0 0 208b 208b
yellow open notebookindex_v2 4ixEGqMUQrKL3D9P6-Bx6w 1 1 0 0 208b 208b
yellow open glossarytermindex_v2 0GdVW9OWTi2hujJ6yOI1Ow 1 1 0 0 208b 208b
yellow open mlprimarykeyindex_v2 EsDswNx_TriyDGB5qDQkCg 1 1 0 0 208b 208b
yellow open .ds-datahub_usage_event-2022.06.06-000001 9ESN2DuARruqm1ECipf3sQ 1 1 278 0 217.6kb 217.6kb
yellow open corpgroupindex_v2 JLS6yZtRQPmIxQvyOhy7TA 1 1 0 0 208b 208b
yellow open dataset_datasetusagestatisticsaspect_v1 k1HmML_1SDiEivM5BiOHmA 1 1 0 0 208b 208b
I'm completely at a lose for what to do next. Could you kindly provide any suggestions?astonishing-guitar-79208
06/06/2022, 1:30 PMproperties
instead of tabular
does not work. The doc mentions that auto rendering of the custom aspect works for DataJob. Can someone help here?creamy-van-28626
06/06/2022, 6:05 PMbulky-controller-34643
06/07/2022, 3:36 AMastonishing-yak-92682
06/07/2022, 8:11 AMfew-air-56117
06/07/2022, 1:11 PMsilly-morning-41994
06/07/2022, 2:00 PMabundant-painter-6
06/07/2022, 2:33 PMfew-air-56117
06/07/2022, 6:53 PMmodern-laptop-12942
06/07/2022, 8:41 PMnumerous-account-62719
06/08/2022, 4:52 AMfew-air-56117
06/08/2022, 7:55 AMgreat-beard-50720
06/08/2022, 11:07 AM...
def get_suite(context: ge.DataContext, suite_name:str) -> ExpectationSuite:
try:
suite = context.get_expectation_suite(expectation_suite_name=suite_name)
print(f'Loaded ExpectationSuite "{suite.expectation_suite_name}" containing {len(suite.expectations)} expectations.')
except DataContextError:
suite = context.create_expectation_suite(expectation_suite_name=suite_name)
print(f'Created ExpectationSuite "{suite.expectation_suite_name}".')
return suite
context_name = 'athena_backoffice_dev'
default_bucket = 'mosaic-backoffice'
conn_string = m_athena.get_athena_conn_string(context_name)
context_config = m_ctxt.get_context_config(context_name, conn_string, default_bucket)
context = m_ctxt.get_context(context_config)
expectation_suite_name = 'demo_suite'
expectation_suite = get_suite(context, expectation_suite_name)
batch_request = {
'datasource_name': 'athena_backoffice_dev',
'data_connector_name': 'default_inferred_data_connector_name',
'data_asset_name': 'borrowing_base.accruals',
'limit': 1000
}
validator = context.get_validator(
batch_request=BatchRequest(**batch_request),
expectation_suite_name=expectation_suite_name
)
column_names = [f'"{column_name}"' for column_name in validator.columns()]
print(f"Columns: {', '.join(column_names)}.")
h = validator.head(n_rows=5, fetch_all=False)
print(h)
When I run that, the following error is returned:
(venv) C:\Dev\great-expectations>python main.py
Loaded ExpectationSuite "demo_suite" containing 0 expectations.
Calculating Metrics: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8.67it/s]
Columns: "trade_date", "trade_strategy", "invoice_id", "trade_id", "internal_legal_entity", "counter_party", "product", "instrument_type", "buy_sell", "native_mtm", "settle_currency", "mtm_usd", "notional_amount", "quantity", "uom", "asset_value_usd", "due_date", "days_past_due", "rating", "tier", "overrides", "enhancement_method", "is_past_due", "source_system", "is_excluded", "exclusion_reason", "hartree_id", "legal_name", "governing_law", "country", "counterparty_name", "lending_facility", "valuation_date", "run_id".
Calculating Metrics: 0%| | 0/1 [00:01<?, ?it/s]
Exceptions
{('table.head', 'batch_id=89c03fce198c1b7a63893b895140eb28', '04166707abe073177c1dd922d3584468'): {'metric_configuration': {
"metric_name": "table.head",
"metric_domain_kwargs": {
"batch_id": "89c03fce198c1b7a63893b895140eb28"
},
"metric_domain_kwargs_id": "batch_id=89c03fce198c1b7a63893b895140eb28",
"metric_value_kwargs": {
"n_rows": 5,
"fetch_all": false
},
"metric_value_kwargs_id": "04166707abe073177c1dd922d3584468",
"id": [
"table.head",
"batch_id=89c03fce198c1b7a63893b895140eb28",
"04166707abe073177c1dd922d3584468"
]
}, 'num_failures': 3, 'exception_info': {{'exception_traceback': 'Traceback (most recent call last):\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 387, in resolve_metrics\n **metric_provider_kwargs\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\expectations\\metrics\\metric_provider.py", line 34, in inner_func\n return metric_fn(*args, **kwargs)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\expectations\\metrics\\table_metrics\\table_head.py", line 132, in _sqlalchemy\n compile_kwargs={"literal_binds": True},\n File "<string>", line 1, in <lambda>\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\elements.py", line 481, in compile\n return self._compiler(dialect, bind=bind, **kw)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\elements.py", line 487, in _compiler\n return dialect.statement_compiler(dialect, self, **kw)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\compiler.py", line 592, in __init__\n Compiled.__init__(self, dialect, statement, **kwargs)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\compiler.py", line 322, in __init__\n self.string = self.process(self.statement, **compile_kwargs)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\compiler.py", line 352, in process\n return obj._compiler_dispatch(self, **kwargs)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\visitors.py", line 96, in _compiler_dispatch\n return meth(self, **kw)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\compiler.py", line 2202, in visit_select\n text, select, inner_columns, froms, byfrom, kwargs\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\sqlalchemy\\sql\\compiler.py", line 2320, in _compose_select_body\n text += self.limit_clause(select, **kwargs)\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\pyathena\\sqlalchemy_athena.py", line 90, in limit_clause\n if limit_clause is not None and select._simple_int_clause(limit_clause):\nAttributeError: \'Select\' object has no attribute \'_simple_int_clause\'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\validator\\validator.py", line 1291, in resolve_validation_graph\n runtime_configuration=runtime_configuration,\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\validator\\validator.py", line 2202, in _resolve_metrics\n runtime_configuration=runtime_configuration,\n File "C:\\Dev\\great-expectations\\venv\\lib\\site-packages\\great_expectations\\execution_engine\\execution_engine.py", line 391, in resolve_metrics\n message=str(e), failed_metrics=(metric_to_resolve,)\ngreat_expectations.exceptions.exceptions.MetricResolutionError: \'Select\' object has no attribute \'_simple_int_clause\'\n', 'exception_message': "'Select' object has no attribute '_simple_int_clause'", 'raised_exception': True}}}}
occurred while resolving metrics.
Traceback (most recent call last):
File "main.py", line 81, in <module>
h = validator.head(n_rows=5, fetch_all=False)
File "C:\Dev\great-expectations\venv\lib\site-packages\great_expectations\validator\validator.py", line 2145, in head
"fetch_all": fetch_all,
File "C:\Dev\great-expectations\venv\lib\site-packages\great_expectations\validator\validator.py", line 891, in get_metric
return self.get_metrics(metrics={metric.metric_name: metric})[
File "C:\Dev\great-expectations\venv\lib\site-packages\great_expectations\validator\validator.py", line 858, in get_metrics
for metric_configuration in metrics.values()
File "C:\Dev\great-expectations\venv\lib\site-packages\great_expectations\validator\validator.py", line 858, in <dictcomp>
for metric_configuration in metrics.values()
KeyError: ('table.head', 'batch_id=89c03fce198c1b7a63893b895140eb28', '04166707abe073177c1dd922d3584468')
That is very similar to what I am seeing here: https://github.com/apache/superset/issues/20168, but I am not sure how that helps me.
Has anyone else seen this before?brave-businessperson-3969
06/08/2022, 12:27 PMplain-napkin-77279
06/08/2022, 2:38 PMambitious-rose-93174
06/08/2022, 3:02 PM[2022-06-08 16:19:17,123] WARNING {great_expectations.dataset.sqlalchemy_dataset:1814} - No recognized sqlalchemy types in type_list for current dialect.
Later, I get the following non-fatal exceptions (there are many of them; I'm assumin one for each 'problematic' field):
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/marco/tmp/datahub/.env/lib/python3.9/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 246, in _sa_execute_fake
handled, result = self._handle_execute(conn, query, args, kwargs)
File "/home/marco/tmp/datahub/.env/lib/python3.9/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 211, in _handle_execute
if not self.is_single_row_query_method(query):
File "/home/marco/tmp/datahub/.env/lib/python3.9/site-packages/datahub/ingestion/source/ge_data_profiler.py", line 218, in _is_single_row_query_method
query_columns = get_query_columns(query)
File "/home/marco/tmp/datahub/.env/lib/python3.9/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 114, in get_query_columns
return list(query.columns)
AttributeError: 'str' object has no attribute 'columns'
[2022-06-08 16:39:36,731] ERROR {datahub.utilities.sqlalchemy_query_combiner:250} - Failed to execute query normally, using fallback: SELECT field
FROM XXXX.YYYY
WHERE 1 = 1 AND field IS NOT NULL
AND ROWNUM <= 20
Traceback (most recent call last):
File "/home/marco/tmp/datahub/.env/lib/python3.9/site-packages/datahub/utilities/sqlalchemy_query_combiner.py", line 111, in get_query_columns
inner_columns = list(query.inner_columns)
AttributeError: 'str' object has no attribute 'inner_columns'
Eventually the profiling ends and the Stats tab on the UI is populated partially. All the numeric fields have these stats set as null: min, max, mean, median, std dev
Any idea on what's the issue here?red-accountant-48681
06/08/2022, 5:41 PMnumerous-account-62719
06/09/2022, 11:32 AMfresh-napkin-5247
06/09/2022, 11:55 AMawsdatacatalog.test-tableau-datasets.test_datasets
and has zero schema information, but is upstream of various tablueau datasources and charts.
• The table picked up by the Athena ingestor is named only test_datasets
and has schema information and is downstream of the correct Glue table.
How could I tell Datahub that they are both the same table and that he should join the metadata from both entities? Thank you 🙂
Datahub version: acryl-datahub, version 0.8.36most-plumber-32123
06/09/2022, 2:11 PMsource:
type: datahub-business-glossary
config:
# Coordinates
file: "<C://Users//Mani//docker//datahub//business_glossary.yaml>"
sink:
type: datahub-rest
config:
server: '<http://localhost:9002/api/gms>'
token: <token>
Here, on file am using my local path, but wanted to move out from local path to the cloud like s3. so then other team members can be able to access and update it if needs. is it possible to add file present in s3? can someone help me heresalmon-football-11785
06/09/2022, 2:16 PMlemon-hydrogen-83671
06/09/2022, 2:35 PMdatahub-upgrade
docker image for restoring indices using postgres? When i set my driver to org.postgresql.Driver
i get complaints 😞alert-teacher-6920
06/09/2022, 2:41 PMbroad-battery-31188
06/09/2022, 2:53 PM