shy-ability-95880
06/16/2022, 9:12 AMchilly-elephant-51826
06/16/2022, 9:46 AMsalmon-area-51650
06/16/2022, 9:48 AMdbt
ingestion. I was running the ingestion and the cronjob seems ok, but I cannot visualize dbt
platform in UI.
Attaching the output log of the ingestion job. And this is the configuration
source:
type: "dbt"
config:
# Coordinates
manifest_path: "<s3://bucket_name/manifest.json>"
catalog_path: "<s3://bucket_name/catalog.json>"
sources_path: "<s3://bucket_name/sources.json>"
aws_connection:
aws_region: "eu-west-2"
# Options
target_platform: "snowflake"
load_schemas: True
env: STG
node_name_pattern:
allow:
- ".*branch_activities.*"
deny:
- ".*test.*"
sink:
type: "datahub-rest"
config:
server: "<http://datahub-datahub-gms:8080>"
Any clue?
Thanks!damp-minister-31834
06/16/2022, 10:58 AM/relationships
API. The official demo is
curl --location --request GET --header 'X-RestLi-Protocol-Version: 2.0.0' '<http://localhost:8080/relationships?direction=OUTGOING&urn=urn%3Ali%3Achart%3Acustomers&types=List(OwnedBy)>'
But if the value of urn contains "(", it will throw error "`org.neo4j.driver.exceptions.ClientException:Invalid input '('`". And the urn of dataset and dataJob both contain '(' and ')'. Is this an issue?ripe-electrician-13049
06/17/2022, 3:07 AMrich-policeman-92383
06/17/2022, 6:02 AMcurved-truck-53235
06/17/2022, 6:39 AMhigh-hospital-85984
06/17/2022, 9:33 AMdataHubPolicy
)
3. Run the index repopulation job
4. Start the ingestion again.
Does this sound like a reasonable plan?breezy-portugal-43538
06/17/2022, 11:07 AMgreat_expectations
command. Although is it possible to update the already existing urn when I already have the results stored in a file from great_expectations? I'm talking here about some nice curl command which I could run passing the json with contents of the great_expectations results, would it be possible?
As always thank you a lot for the help : )brave-tomato-16287
06/17/2022, 11:13 AMredshift-usage
ingestion:
'[2022-06-17 03:11:32,443] INFO {datahub.cli.ingest_cli:97} - DataHub CLI version: 0.8.36\n'
'[2022-06-17 03:11:36,611] INFO {datahub.cli.ingest_cli:113} - Starting metadata ingestion\n'
'/usr/local/bin/run_ingest.sh: line 26: 320 Killed ( python3 -m datahub ingest -c "$4/$1.yml" )\n',
"2022-06-17 03:14:45.501901 [exec_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81] INFO: Failed to execute 'datahub ingest'",
'2022-06-17 03:14:45.505219 [exec_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81] INFO: Caught exception EXECUTING '
'task_id=8ccd646c-fc06-4bfd-a7af-cc622cc5be81, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 89, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
Duration about 300s.
Can anybody suggest what should we do?lemon-nail-49127
06/17/2022, 6:32 PMbrave-tomato-16287
06/20/2022, 9:27 AM{\'tableau-metadata\': ["Connection: customSQLTablesConnection Error: [{\'message\': \'Showing partial results. The '
'request exceeded the "\n'
' "20000 node limit. Use pagination, additional filtering, or both in the query to adjust results.\', '
'\'extensions\': "\n'
' "{\'severity\': \'WARNING\', \'code\': \'NODE_LIMIT_EXCEEDED\', \'properties\': {\'nodeLimit\': '
'20000}}}]"]},\n'
Do we have solution for this?swift-breakfast-25077
06/20/2022, 9:43 AMbetter-orange-49102
06/20/2022, 10:54 AMname: Upload image locally for testing (if not publishing)
uses: ishworkh/docker-image-artifact-upload@v1
if: ${{ needs.setup.outputs.publish != 'true' }}
with:
image: ${{ steps.docker_meta.outputs.tags }}
whenever i merge to master branch it mysteriously fails all the containers with the cyrptic message:
Run ishworkh/docker-image-artifact-upload@v1
Error: RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded
swift-breakfast-25077
06/20/2022, 12:30 PMswift-breakfast-25077
06/20/2022, 12:51 PMcurved-crayon-1929
06/20/2022, 4:27 PMgentle-camera-33498
06/20/2022, 5:12 PM'[2022-06-20 16:55:51,002] INFO {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.38\n'
'[2022-06-20 16:55:54,017] INFO {datahub.cli.ingest_cli:115} - Starting metadata ingestion\n'
'[2022-06-20 16:55:54,017] INFO {datahub.ingestion.source.sql.bigquery:367} - Populating lineage info via GCP audit logs\n'
"[2022-06-20 16:55:54,021] ERROR {datahub.ingestion.source.sql.bigquery:505} - lineage-gcp-logs => Error was ('Failed to load service "
"account credentials from /tmp/tmp01jy7wi8', ValueError('Could not deserialize key data. The data may be in an incorrect format, it may "
"be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).', "
"[_OpenSSLErrorWithText(code=503841036, lib=60, reason=524556, reason_text=b'error:1E08010C:DECODER routines::unsupported')]))\n"
This is related to the creation of the encryption key?quick-megabyte-61846
06/20/2022, 6:18 PMacryl-datahub[dbt]==0.8.38.1rc1
While trying to ingest test that isn’t in test_name_to_assertion_map
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/dbt.py#L742
In UI Im getting this (screens below)
While digging In code base I found this:
elif kw_args.get("column_name"): <- in this case
logic=node.compiled_sql
if node.compiled_sql
else node.raw_sql,
In my observation logic
kinda broking UI in assertion
PS. Huge thanks @mammoth-bear-12532 for this integrationbitter-dusk-52400
06/21/2022, 5:31 AMfew-air-56117
06/21/2022, 9:46 AMmillions-notebook-72121
06/21/2022, 12:39 PMdelightful-sugar-63810
06/21/2022, 2:50 PMELASTICSEARCH_INDEX_BUILDER_NUM_RETRIES
. I cannot see any change in behaviour even if I change the value of it. Can it be because this space here?
This environment variable lets you to adjust how much GMS will wait for the validation of reindexing tasks it triggers on ES on Datahub version upgrades. (See flow: 1 -> 2 -> 3)
We are passing this variable as we pass others on helm, from extra env vars. I can validate that it is passed to the container by describing the deployment on kubernetes.
Still, we cannot observe the effect of it on the pod. I have tried to investigate the issue and see if I'm doing something silly but it seems like I'm not 😬. There is one possible cause which is an extra space in the yaml file when reading the env var to the application.yml file in here, but I couldn't reproduce the same behaviour of using the default value with this space added in my local on a different project. There is nothing online for the behaviour of spring in case of such usage of its variable substitution with additional space.
Can you also take a look? I know it is a very loose question but we are kind of stuck now 😄 Should I just open a pr? Am I doing something totally wrong? Can there be an spring bean initialization issue causing @Value
to not work(I'm so far away from spring ecosystem)helpful-painting-48754
06/22/2022, 8:27 AM'Profiling - Unable to get column cardinality'
I tried to ignore the columns with this error but other columns would pop up with this error after ignoring.quick-megabyte-61846
06/22/2022, 8:48 AMsource_ref
source_url
are not reflected in UI
Source yaml:
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/bootstrap_data/business_glossary.yml
Recipe yaml:
source:
type: datahub-business-glossary
config:
file: ./business_glossary/business_glossary.yml
sink:
type: datahub-rest
config:
server: <http://localhost:8080>
version: 0.8.38
PS. I added another screen with desirable effectabundant-receptionist-6114
06/22/2022, 11:59 AMcalm-dinner-63735
06/22/2022, 12:18 PMcalm-dinner-63735
06/22/2022, 12:18 PMwitty-butcher-82399
06/22/2022, 12:28 PMStatus(removed=True)
and the second event for the Ownerhip
aspect (nothing about the status in the second upsert).
• The dataset is wrongly shown in the UI as a valid dataset (not soft-deleted). We have also checked the backend and the dataset has Status(removed=False)
.
So if the issue is not during the ingestion, it must be the backend the one deciding to enable back the dataset for some reason.
Looking for something supporting our assumption we have found this in the source code https://github.com/datahub-project/datahub/blob/8c8f1b987a0c9fc29f4005aa8d132ad2550f3f05/metadata-io/src/main/java/com/linkedin/metadata/entity/EntityService.java#L1097 I could be wrong but it looks like in some cases, the backend decides to set the removal flag to false. It’s like it decides to re-enable back the dataset because there are other aspects being updated. If that’s true and while it could make sense in some cases, it causes our simple use case to misbehave. WDYT? Could be that the root cause of the issue?brave-tomato-16287
06/22/2022, 12:39 PM[2022-06-22 12:28:32,603] INFO {datahub.cli.ingest_cli:99} - DataHub CLI version: 0.8.38\n'
'1 validation error for DBTConfig\n'
'test_results_path\n'
' extra fields not permitted (type=value_error.extra)\n',
Should we wait the next update?