late-arm-1146
12/13/2022, 11:27 PMrefined-energy-76018
12/13/2022, 11:31 PMbetter-orange-49102
12/14/2022, 7:01 AMplain-cricket-83456
12/14/2022, 7:11 AMlate-book-30206
12/14/2022, 8:50 AM'[2022-12-14 08:46:01,730] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit coffreoDirectory.coffreoDirectory\n'
'[2022-12-14 08:46:02,131] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit coffreoDirectory.coffreoDirectoryDump\n'
'[2022-12-14 08:46:02,422] INFO {datahub.ingestion.run.pipeline:103} - sink wrote workunit coffreoDirectory.coffreoDirectory_logs\n'
'[2022-12-14 08:46:02,423] INFO {datahub.cli.ingest_cli:137} - Finished metadata ingestion\n'
'\n'
'Source (mongodb) report:\n'
"{'workunits_produced': '3',\n"
" 'workunit_ids': ['coffreoDirectory.coffreoDirectory', 'coffreoDirectory.coffreoDirectoryDump', "
"'coffreoDirectory.coffreoDirectory_logs'],\n"
" 'warnings': {},\n"
" 'failures': {},\n"
" 'cli_version': '0.8.43',\n"
" 'cli_entry_location': '/tmp/datahub/ingest/venv-3c4dd982-80a2-42a3-b7fc-9e0bbd0f5c06/lib/python3.9/site-packages/datahub/__init__.py',\n"
" 'py_version': '3.9.9 (main, Dec 21 2021, 10:03:34) \\n[GCC 10.2.1 20210110]',\n"
" 'py_exec_path': '/tmp/datahub/ingest/venv-3c4dd982-80a2-42a3-b7fc-9e0bbd0f5c06/bin/python3',\n"
" 'os_details': 'Linux-4.15.0-194-generic-x86_64-with-glibc2.31',\n"
" 'filtered': ['CoffreoPdfClient',\n"
" 'app',\n"
" 'auth',\n"
" 'booking',\n"
" 'cmde',\n"
" 'coffreo-entities',\n"
" 'coffreoEvents',\n"
" 'deliveries',\n"
" 'delivery',\n"
" 'documentsapi',\n"
" 'documentsapi-vms',\n"
" 'entitySettings',\n"
" 'eventStore',\n"
" 'events',\n"
" 'hosting',\n"
" 'integrationCME',\n"
" 'log',\n"
" 'messagers',\n"
" 'metadata',\n"
" 'mission',\n"
" 'notifications',\n"
" 'oauth2_server',\n"
" 'randstadB2B',\n"
" 'randstadBPE',\n"
" 'standard',\n"
" 'startpeople',\n"
" 'sunnyCerts',\n"
" 'timesheet',\n"
" 'versioning']}\n"
'Sink (datahub-rest) report:\n'
"{'records_written': '3',\n"
" 'warnings': [],\n"
" 'failures': [],\n"
" 'downstream_start_time': '2022-12-14 08:46:00.594476',\n"
" 'downstream_end_time': '2022-12-14 08:46:02.422375',\n"
" 'downstream_total_latency_in_seconds': '1.827899',\n"
" 'gms_version': 'v0.8.43'}\n"
'\n'
'Pipeline finished successfully producing 3 workunits\n',
"2022-12-14 08:46:02.955894 [exec_id=3c4dd982-80a2-42a3-b7fc-9e0bbd0f5c06] INFO: Successfully executed 'datahub ingest'"]}
Execution finished successfully!
false
wonderful-vegetable-45135
12/14/2022, 1:23 PMswift-dream-78272
12/14/2022, 2:38 PMhost_port: $HOST:$PORT
....
Is that even possible? So far all my experiments get me error invalid literal int base 10
kind-vr-18537
12/14/2022, 4:31 PMplain-controller-95961
12/14/2022, 5:50 PMlively-dusk-19162
12/14/2022, 7:04 PMrefined-energy-76018
12/14/2022, 10:30 PMincalculable-queen-1487
12/15/2022, 2:46 AMfull-chef-85630
12/15/2022, 5:49 AM"usageStats": {
"aggregations": {
"uniqueUserCount": 1,
"users": [
{
"user": {
"username": "content-backend-prod"
},
"userEmail": "<http://xxx.gserviceaccount.com|xxx.gserviceaccount.com>",
"count": 165
}
],
"totalSqlQueries": 0
}
},
modern-answer-65441
12/15/2022, 7:08 AM01:02:16.024 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 1 Took time ms: -1 Message: failure in bulk execution:
[0]: index [datahubsecretindex_v2], type [_doc], id [urn%3Ali%3AdataHubSecret%3ASECRET_CREATED_BY_SSEITLER_SSO_2], message [[datahubsecretindex_v2/-OUJuv_aSvWNBB2-NiVKfg][[datahubsecretindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubSecret%3ASECRET_CREATED_BY_SSEITLER_SSO_2]: document missing]]
7:06:44.882 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:25 - Failed to feed bulk request. Number of events: 2 Took time ms: -1 Message: failure in bulk execution:
[0]: index [corpgroupindex_v2], type [_doc], id [urn%3Ali%3AcorpGroup%3Ac377308d-1b2a-42e9-a407-6595a4520962], message [[corpgroupindex_v2/E8CJttR6RwKYFEFXdblmTg][[corpgroupindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AcorpGroup%3Ac377308d-1b2a-42e9-a407-6595a4520962]: document missing]]]
[1]: index [corpgroupindex_v2], type [_doc], id [urn%3Ali%3AcorpGroup%3Ac377308d-1b2a-42e9-a407-6595a4520962], message [[corpgroupindex_v2/E8CJttR6RwKYFEFXdblmTg][[corpgroupindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AcorpGroup%3Ac377308d-1b2a-42e9-a407-6595a4520962]: document missing]]]
This happens when we do any ingestion from any datasource.
can someone help debug this ?
I used this link: https://github.com/datahub-project/datahub/issues/6689 - which says I need to set MAE_CONSUMER_ENABLED to true
However it didn't help me
Any suggestions please ?magnificent-lock-58916
12/15/2022, 9:45 AMtall-father-13753
12/15/2022, 10:10 AMurn:li:dataFlow:(drone,<SOME-ID>,gold)
. I’ve found this doc: https://datahubproject.io/docs/how/delete-metadata/ and according to it, used command:
$ datahub delete --env gold --entity_type dataFlow --platform drone --hard -f
But it didn’t worked. I got:
[2022-12-15 10:06:21,679] INFO {datahub.cli.delete_cli:287} - datahub configured with <http://localhost:8080>
[2022-12-15 10:06:22,075] INFO {datahub.cli.delete_cli:323} - Filter matched dataFlow entities of drone. Sample: []
No urns to delete. Maybe you want to change entity_type=dataFlow or platform=drone to be something different?
Took 1.007 seconds to hard delete 0 versioned rows and 0 timeseries aspect rows for 0 entities.
Am I doing something wrong?salmon-angle-92685
12/15/2022, 10:53 AMfuture-iron-16086
12/15/2022, 12:45 PM'sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00942: table or view does not exist\n'
'[SQL: SELECT username FROM dba_users ORDER BY username]\n'
'(Background on this error at: <http://sqlalche.me/e/13/4xp6>)\n'
'[2022-12-15 12:25:46,603] ERROR {datahub.entrypoints:195} - Command failed: \n'
'\t(cx_Oracle.DatabaseError) ORA-00942: table or view does not exist\n'
'[SQL: SELECT username FROM dba_users ORDER BY username]\n'
'(Background on this error at: <http://sqlalche.me/e/13/4xp6>) due to \n'
"\t\t'ORA-00942: table or view does not exist'.\n"
Any help to solve this?colossal-sandwich-50049
12/15/2022, 3:18 PMplatform=delta
a folder structure is created (i.e. the dataset is seen under the folder mlprod in the UI); this is the desired behavior. When platform=s3
this behavior does not happen; the dataset is not created under a folder.
Can someone advise on why the above happens? Ideally I would like the same behavior on platform=s3 as on platform=delta
Note: I have isolated the cause of this behavior to the platform (i.e. FabricType, the aspect type, etc do not factor in).
String platform = "s3" // different behavior depending on whether this value is s3 or delta
DataPlatformUrn defaultDataPlatformUrn = new DataPlatformUrn(platform);
DatasetUrn defaultDatasetUrn = new DatasetUrn(
defaultDataPlatformUrn,
"mlprod." + name,
FabricType.DEV
);
cc: @great-toddler-2251curved-apple-55756
12/15/2022, 4:27 PMdatahub ingest -c file.yaml
failed. Here is the error : sqlalchemy.exc.DBAPIError: (pyodbc.Error) ('01000', "[01000] [unixODBC][Driver Manager]Can't open lib 'IBM DB2 ODBC DRIVER' : file not found
My datahub is installed on Debian. Have you ever meet this kind of issue ? Do you know how to solve it ? Any advice would be appreciated !breezy-rainbow-26977
12/15/2022, 7:12 PMFailed to create ingestion source!: Failed to fetch
unsure if its a networking error or an authentication error. Are there any logs I can check or something similar?acceptable-account-83031
12/15/2022, 7:41 PMambitious-notebook-45027
12/16/2022, 3:24 AMbland-room-3136
12/16/2022, 4:30 AMfew-tent-75240
12/16/2022, 12:52 PMwonderful-hair-89448
12/16/2022, 1:16 PMgifted-bird-57147
12/16/2022, 2:36 PM'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'message': '413 Client Error: Payload Too Large for url: '
'<https://noord-holland.acryl.io/api/gms/aspects?action=ingestProposal>',
'id': 'urn:li:dataset:(urn:li:dataPlatform:mssql,GPBUAPP.areaaldata.BEHEERGRENZEN_V,PROD)'}}]
I suspect it has to do with the profile of a column of type varbinary(MAX)..... It would be really useful to have this feature request implemented: https://feature-requests.datahubproject.io/p/sql-profile-allow-deny-by-data-type so I can exclude those columns from profiling...acceptable-account-83031
12/16/2022, 3:38 PMwonderful-hair-89448
12/16/2022, 4:12 PMambitious-shoe-92590
12/16/2022, 9:32 PM...
path_specs:
- include: '<s3://bucket-name/path/table/{partition_key[0]}={partition[0]}/*.parquet>'
...
and this is producing n
number of "datasets" under table
based on how many `partition_key`s there are