powerful-shampoo-81990
05/16/2023, 2:49 PMcuddly-butcher-39945
05/16/2023, 2:59 PM~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': '39e4de59-eb3e-4739-b024-c51ea5c76fbe',
'infos': ['2023-05-16 01:00:00.066911 INFO: Starting execution for task with name=RUN_INGEST',
'2023-05-16 01:00:00.067371 INFO: Caught exception EXECUTING task_id=39e4de59-eb3e-4739-b024-c51ea5c76fbe, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 112, in execute_task\n'
' task_event_loop = asyncio.new_event_loop()\n'
' File "/usr/local/lib/python3.10/asyncio/events.py", line 783, in new_event_loop\n'
' return get_event_loop_policy().new_event_loop()\n'
' File "/usr/local/lib/python3.10/asyncio/events.py", line 673, in new_event_loop\n'
' return self._loop_factory()\n'
' File "/usr/local/lib/python3.10/asyncio/unix_events.py", line 64, in __init__\n'
' super().__init__(selector)\n'
' File "/usr/local/lib/python3.10/asyncio/selector_events.py", line 53, in __init__\n'
' selector = selectors.DefaultSelector()\n'
' File "/usr/local/lib/python3.10/selectors.py", line 350, in __init__\n'
'OSError: [Errno 24] Too many open files\n'],
'errors': []}
~~~~ Ingestion Logs ~~~~
My ingestion configuration...
source:
type: dbt-cloud
config:
max_threads: 1
metadata_endpoint: '<https://my-metadata-cloud>-.com/graphql'
project_id: '3'
job_id: '82'
target_platform: snowflake
stateful_ingestion:
enabled: true
account_id: '9999'
token: MYDBTToken
Failed to configure the source (dbt-cloud): 1 validation error for DBTCloudConfig
max_threads
extra fields not permitted (type=value_error.extra)
Whenever I don't specify max_threads the ingestion works, but I keep hitting a file handle leak. (Error shown above).
I added "max_threads: 1"
However, this config does not work either.
@gray-shoe-75895 @big-carpet-38439, we looked at this last week for an on-prem dbt fix, (Essentially telling DataHub to run the dbt ingestion single-threaded), but this is not working for dbt cloud.
Any help would be appreciated!
Thanksglamorous-easter-30119
05/16/2023, 3:28 PMgentle-camera-33498
05/16/2023, 4:55 PMsteep-alligator-93593
05/16/2023, 10:31 PMThe node was low on resource: ephemeral-storage. Container datahub-gms was using 8220940Ki, which exceeds its request of 0.
Pod The node had condition: [DiskPressure].
Any ideas on where gms stores it's data? And why this could be happening?better-spoon-77762
05/16/2023, 10:50 PMimportant-night-50346
05/17/2023, 3:15 AMsearchAcrossEntities
queries for entities count <10000?
Below issue for long time already, it was present in 0.9.5 and it is there in 0.10.2.
I ingested exactly 5718 entities into datahub running in quickstart mode and running very simple query, which emulates :
query getAllEntities {
searchAcrossEntities(
input: {
start: 4000
count: 10
query: ""
}
) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
It fails with
{
"servlet": "apiServlet",
"message": "Service Unavailable",
"url": "/api/graphql",
"status": "503"
}
It does not look to be an issue with 10k entities limit in elasticsearch, more likely it is caused by some sort of timeout - it always run 30s and fails. I also tried to run a query with start=0 and count=5718 it is also failing in 30s, which makes me think it is a timeout issue.
The very same searchAcrossEntities
query is used in the UI and causing error there as well. Any thoughts or suggestions how to fix it?purple-balloon-66501
05/17/2023, 10:01 AMstale-architect-93411
05/17/2023, 12:58 PMNullPointerException
. On #integration-databricks-datahub I found a specific jar for databricks: https://datahubspace.slack.com/archives/C033H1QJ28Y/p1646937756282179 but it s more than a year old..
With this Jar my workflow is in success, I find data on datahub, however the lineage is empty, there are no references to input/output data (more info in thread)
Does anyone manage to have lineage with spark on databricks? π blob helpfull-shoe-73099
05/17/2023, 1:54 PMquiet-television-68466
05/17/2023, 2:33 PMstale-traffic-76901
05/17/2023, 7:38 PMstale-traffic-76901
05/17/2023, 7:38 PMpowerful-planet-87080
05/17/2023, 8:12 PMprehistoric-farmer-31305
05/17/2023, 10:21 PMAn unknown error occurred. (code 500)
error message in the UI (datahub v0.10.2) after ingesting dbt (via acryl cli). I am able to see the datasets, but I cannot drill-down to get table definitions etc. Is something corrupted in ES or the mysql-db?numerous-account-62719
05/18/2023, 5:13 AMastonishing-father-13229
05/18/2023, 5:49 AMnutritious-photographer-79168
05/18/2023, 8:12 AMtall-butcher-30509
05/18/2023, 8:31 AM{query : searchAcrossEntities(input: {query: "<our custom property>=<value>"}) { searchResults { entity { ... on Dataset { urn } } } } }
But when I try to send the same query via axios I get an error from datahub
JSON:
const requestData = {
"query" : `searchAcrossEntities(input: {query: "<our custom property>=${value}"}) { searchResults { entity { ... on Dataset { urn } } } } `
}
Error:
[{"message":"Invalid Syntax : offending token 'searchAcrossEntities' at line 1 column 1","locations":[{"line":1,"column":1}],"extensions":{"classification":"InvalidSyntax"}}]
bland-gigabyte-28270
05/18/2023, 9:46 AMdatahub-gms 2023-05-18 09:42:42,216 [I/O dispatcher 1] ERROR c.l.m.s.e.update.BulkListener:44 - Failed to feed bulk request. Number of events: 6 Took time ms: -1 Message: failure in bulk execution:
datahub-gms [1]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstateindex_v
2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-filters]: document missing]]]
datahub-gms [4]: index [datahubstepstateindex_v2], type [_doc], id [urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search], message [[datahubstepstateindex_v2/VMzXqnXeSnWqzq_cS6B4XA][[datahubstepstat
eindex_v2][0]] ElasticsearchException[Elasticsearch exception [type=document_missing_exception, reason=[_doc][urn%3Ali%3AdataHubStepState%3Aurn%3Ali%3Acorpuser%3Adatahub-search-results-advanced-search]: document missing]]]
Version: 0.10.2
Helmchart version: datahub-0.2.164
Note that I have to disable datahubUpgrade
since it keeps checking health of datahub-gms
and failing.magnificent-honey-40185
05/18/2023, 3:30 PMmagnificent-honey-40185
05/18/2023, 3:31 PMbland-gigabyte-28270
05/19/2023, 1:12 AMPermission Denied
on one of the jobs (Snowflake), can someone help?
No ~/.datahubenv file found, generating one for you...
PermissionError: [Errno 13] Permission denied: '/.datahubenv'
bland-gigabyte-28270
05/19/2023, 2:05 AMTest Connection
, using the Password as plain text works, but the built-in secret storage fails. Is this feature still supported?fierce-electrician-85924
05/19/2023, 5:42 AMrestoreIndices
job under datahub-upgrade image. but we wanted to make sure that this issue gets detected earlier with the help of some metrics. Do datahub release any metric for such issue?
I know datahub releases prometheus metrics, I read about it here. but do we have any wiki where description of useful metrics and how they can help is available ?clever-motherboard-6054
05/19/2023, 8:25 AMbumpy-engineer-7375
05/19/2023, 1:39 PMchilly-boots-22585
05/20/2023, 12:51 PM*command*: helm show values datahub/datahub-prerequisites > datahub-prerequisites-original.yaml
Error: INSTALLATION FAILED: failed pre-install: timed out waiting for the condition
when i check logs for datahub-elasticsearch-setup pod
kubectl logs -f datahub-elasticsearch-setup-job-cv4qq
2023/05/20 124548 Waiting for: https://datahub-es:dglk4904hzD_4@https//search datahub starburst es m2riu6mafdqbow3ca.eu west 1.es.amazonaws.com443
2023/05/20 124548 Problem with request: Get https://datahub-es:dglk4904hgh_4@https<//search-datahub-starburst-es-m2rih5h4w4ppfdqbow3ca.eu-west-1.es.amazonaws.com443> dial tcp: lookup https on 10.100.0.1053 no such host. Sleeping 1schilly-boots-22585
05/20/2023, 12:52 PMrough-lamp-22858
05/21/2023, 6:06 AM