rapid-crowd-46218
04/26/2023, 7:16 AMemit_s3_lineage=true
and glue_s3_lineage_direction=upstream
, but he lineage may not appear in the UI. However, if it is specified as glue_s3_lineage_direction=downstream
, the lineage will be visible in the UI. What could be the reason for this? there is no error cli ingest report. And after ingest, there is 'upstraemLineage' in source (glue) report.thousands-yacht-8284
04/26/2023, 7:23 AMlate-furniture-56629
04/26/2023, 10:17 AMgifted-market-81341
04/26/2023, 11:54 AMwitty-butcher-82399
04/26/2023, 12:12 PMASYNC_INGEST_DEFAULT
feature called our attention.
https://datahubspace.slack.com/archives/CV2UXSE9L/p1681587580923939?thread_ts=1681215034.637799&cid=CV2UXSE9L
https://datahubspace.slack.com/archives/CV2UXSE9L/p1681588160609639?thread_ts=1681215034.637799&cid=CV2UXSE9L
I have a couple of questions:
• Is this flag exposed in the GMS API? As a user of the GMS API I would like to process some of my requests in async mode.
• Assuming the async scenario and in case of authorization is enabled for the system, is the event authorized before sent to the async queues? Or those events will be unauthorized?
Thanks!billions-baker-82097
04/26/2023, 3:21 PMlively-dusk-19162
04/26/2023, 3:33 PMhelpful-tent-87247
04/26/2023, 4:41 PMfierce-restaurant-41034
04/27/2023, 8:36 AMinsert into x values('foo')
I want to see the insert command.
Thanksnumerous-refrigerator-15664
04/27/2023, 10:03 AMEROR {datahub.entrypoints:192} - Command failed: Cannot open config file presto-on-hive.dhub.yaml
when I try datahub ingest -c presto-on-hive.dhub.yaml
.
According to some threads in slack, the reason seems my datahub in docker container cannot read the yaml file in host directory. But I'm not getting the answer.
So my question is...
1. Which container should be able to read my yaml file? datahub-gms?
2. Should I mount my host directory to the docker container? It is said "For docker, we set docker-compose to mount ${HOME}/.datahub
directory to /etc/datahub
directory within the GMS containers." on this page: https://datahubproject.io/docs/plugins/#plugin-installation but it seems the changes are not updated.
Thank you in advance!incalculable-processor-75603
04/27/2023, 11:03 AMadd ability to preserve dbt table identifier casing
, but the vercel bot report that the deployment has failed. PR here: https://github.com/datahub-project/datahub/pull/7854
I am also tested the new code in my local environment and it work, so I don't know why thing happen, and don't know when the PR ready to merge.
Therefore, what should I do next? Please help me
Thanks for your advice!fresh-dusk-60832
04/27/2023, 12:44 PMsource:
type: athena
config:
aws_region: us-east-1
work_group: primary
include_views: true
include_tables: true
catalog_name: dynamodb
database: default
query_result_location: '<s3://xxx/xxx/>'
and this catalog + database are reading data from my DynamoDB using Athena Connector (Lambda).
If I configure my recipe to grab the metadata from the default catalog (awsdatacatalog), it works perfectly.
Any clue? maybe the Athena connector only works with Data Source Type = AWS Glue Data Catalog?rich-policeman-92383
04/27/2023, 1:08 PMbland-orange-13353
04/27/2023, 1:37 PMadamant-honey-44884
04/27/2023, 4:54 PMclever-magician-79463
04/27/2023, 5:05 PMable-evening-90828
04/27/2023, 5:52 PMpostgres
database when listing the databases.
So we would change the following line:
engine = create_engine(url, **self.config.options)
to something like below:
engine = create_engine(self.config.get_sql_alchemy_url(database="postgres"), **self.config.options)
If there is no objection, we will send a PR out to address this.
@hundreds-photographer-13496 @gray-shoe-75895 @famous-waitress-64616flat-painter-78331
04/28/2023, 12:54 AMelegant-salesmen-99143
04/28/2023, 12:30 PMplatform_instance
in Kafka connect dosc?
We have a working Kafka connection, but I wanted to enable stateful ingestion in it, but I can't without specifying platform instance, and I'm not sure what that is.
We're using Confluent Kafka, is that it? Should I write smth like platform_instance: confluent
?important-bear-9390
04/28/2023, 4:18 PMERROR DatasetExtractor: class org.apache.spark.sql.catalyst.plans.logical.Aggregate is not supported yet.
)
Any tips what could I do to solve this ?bright-waitress-5179
04/28/2023, 6:39 PMacryl-datahub, version 0.10.2.2
'failures': [{'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:400]: Cannot parse request entity\n'
'\tat com.linkedin.restli.server.RestLiServiceException.fromThrowable(RestLiServiceException.java:315)\n'
'\tat com.linkedin.restli.server.BaseRestLiServer.buildPreRoutingError(BaseRestLiServer.java:202)',
'message': 'Cannot parse request entity',
'status': 400,
'id': 'urn:li:dataset:(urn:li:dataPlatform:snowflake,segment_prod.core_mobile_production.appointment_save,PROD)'}}]
bright-waitress-5179
04/28/2023, 6:57 PMacryl-datahub, version 0.10.2.2
File "/tmp/datahub/ingest/venv-looker-0.10.2/lib/python3.10/site-packages/sqllineage/__init__.py", line 24, in _patch_updating_lateral_view_lexeme
if regex("LATERAL VIEW EXPLODE(col)"):
TypeError: 'str' object is not callable
purple-salesmen-12745
04/29/2023, 6:59 PMrich-policeman-92383
05/01/2023, 8:09 AM[2023-04-30 18:24:05,254] ERROR {datahub.utilities.sqlalchemy_query_combiner:403} - Failed to execute queue using combiner: (trino.exceptions.TrinoQueryError) TrinoQueryError(type=INSUFFICIENT_RESOURCES, name=EXCEEDED_TIME_LIMIT, message="Query exceeded the maximum execution time limit of 10.00m"
["Profiling exception (trino.exceptions.OperationalError) error 404: b'Query not found'\n(Background on this error at: <https://sqlalche.me/e/14/e3q8>)"]
Recipe yaml:
source:
type: "trino"
config:
host_port: ip:port
database: hive_2
username: tr
password:
schema_pattern:
deny:
- .*information_schema.*
allow:
- B
- A
table_pattern:
allow:
- hive_2.A.table1
- hive_2.B.table2
profiling:
enabled: True
profile_pattern:
allow:
- hive_2.A.table1
- hive_2.B.table2
transformers:
- type: "simple_add_dataset_tags"
config:
tag_urns:
- "urn:li:tag:1_0_prod_datalake"
pipeline_name: "trino_hive_prod_to_datahub_prod"
datahub_api:
server: "<https://gms:8080>"
token:
sink:
type: "datahub-rest"
config:
server: "<https://gms:8080>"
token:
bitter-evening-61050
05/01/2023, 9:47 AMelegant-nightfall-29115
05/01/2023, 11:24 PM/datahub/datahub-gms/resources/policies.json
however the ingestion-cron pod can't find that path. Recipe file looks like this
source:
type: file
config:
# Coordinates
filename: ../policies.json
sink:
type: file
config:
filename: /datahub/datahub-gms/resources/policies.json
I am trying to remove the permission of MANAGE_INGESTION
from all users as to totally disable UI ingestion.billions-baker-82097
05/02/2023, 11:11 AMbland-orange-13353
05/02/2023, 12:01 PMpurple-printer-15193
05/02/2023, 3:34 PMsnowflake.account_usage.tag_references
table but I don’t see this table in the lineage. The snowflake.account_usage.tag_references
also never gets ingested by our Snowflake ingestion recipe. Lastly, when I try to just ingest the SNOWFLAKE
database I get an error like below:
"source": {
"type": "snowflake",
"report": {
"events_produced": 0,
"events_produced_per_sec": 0,
"entities": {},
"aspects": {},
"warnings": {},
"failures": {
"permission-error": [
"No tables/views found. Please check permissions."
]
},
I can definitely see and query the snowflake.account_usage.tag_references
table using the Snowflake UI though so I’m not sure if it’s really a permission error at all.
Thanks.fierce-animal-98957
05/02/2023, 4:26 PM