worried-terabyte-81786
01/26/2022, 3:48 PMplain-farmer-27314
01/26/2022, 4:11 PM"urn":"urn:li:mlFeatureTable:(urn:li:dataPlatform:sagemaker,fraud-prediction)"
. In fact I even see "Feature Table" listed alongside other entities in the "Explore your metadata" section
And then, what is in the urn? Is it the snapshot name, or the entity name?
Finally, specifically for mlfeaturetable, which entity does that actually belong to?crooked-market-47728
01/26/2022, 6:01 PMReceived 401 from <https://vpc-datahub-rtixgcnstthsm6uosewxuafrpy.us-west-2.es.amazonaws.com:443>. Sleeping 1s
I test with AWS Opensearch 1.1 and Elasticsearch 7.10, and same error
This my part of the values.yaml
elasticsearch:
host: <http://vpc-datahub-rtixgcnstthsm6uosewxuafrpy.us-west-2.es.amazonaws.com|vpc-datahub-rtixgcnstthsm6uosewxuafrpy.us-west-2.es.amazonaws.com>
port: "443"
useSSL: "true"
auth:
username: root
password:
secretRef: elasticsearch-secrets
secretName: elasticsearch-password
elasticsearchSetupJob:
enabled: true
image:
repository: linkedin/datahub-elasticsearch-setup
tag: "v0.8.23"
extraEnvs:
- name: USE_AWS_ELASTICSEARCH
value: "true"
EKS cluster where datahub im trying to install is same VPC as AWS Elasticsearch/Opensearch cluster, and same for Security Group
Secret is with the correct value, and confirm testing from browser is working the domain
Could anyone please help me, if im doing something bad?handsome-belgium-11927
01/26/2022, 8:51 PMCaused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry 'urn:li:dataPlatform:clickhouse-dataPlatformKey-0' for key 'PRIMARY'\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:117)
And not ingesting custom platforms. This was working fine in previous releases (I'm at 0.8.23 atm)
Any idea how to fix that?adorable-flower-19656
01/27/2022, 5:05 AMdazzling-appointment-34954
01/27/2022, 9:23 AMclean-crayon-15379
01/27/2022, 5:56 PMcx_Oracle.DatabaseError: DPI-1037: column at array position 0 fetched with error 1406
Searching for DPI-1037 in here showed that this is a recurrent issue, usually solved by denying import of affected views. This is, unfortunatelly, not an option in our use case. It seems that the issue is due to SQLalchemy lacking cursor.outputtypehandler option when called by the ingest script. I already passed options to SQLalchemy create_engine part, but this does not seem to be feasible as is. Do you have an idea how I could tweak the cursor invoked by the pipeline.run()?witty-butcher-82399
01/27/2022, 7:17 PM'records_written': 220
• I’m running the ingestion command from a k8s cronjob. Since the ingestion finishes with failures, the pod fails and so it is retried. Does it mean we are republishing the events in every retry? Does that make sense?miniature-television-17996
01/27/2022, 9:08 PMjavax.servlet.ServletException: org.springframework.web.util.NestedServletException: Request processing failed; nested exception is java.lang.UnsupportedOperationException: GraphQL gets not supported.
:8080
{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat
Please help!!
doc
https://datahubproject.io/docs/api/graphql/getting-startedminiature-television-17996
01/27/2022, 9:26 PMacceptable-horse-58553
01/28/2022, 4:31 AMacceptable-horse-58553
01/28/2022, 4:51 AMsquare-machine-96318
01/28/2022, 6:23 AMloud-musician-49912
01/28/2022, 2:26 PMmodern-monitor-81461
01/28/2022, 6:37 PMdatahub.configuration.common.OperationalError: ('Unable to emit metadata to DataHub GMS', {'message': "HTTPConnectionPool(host='datahub-datahub-gms.datahub.svc.cluster.local', port=8080): Max retries exceeded with url: /entities?action=ingest (Caused by NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known'))"})
I now have the following questions:
1. Is it possible to disable that emitting from Airflow on-demand?
2. Is it possible to make Airflow return a warning instead of a failure when it can't reach GMS? In this case, my DAG primary task got completed, it's only "the reporting" to DataHub that failed... In other words, how can I break the dependency of Airflow on GMS...?
3. Is using Kafka as an endpoint the only way to break that dependency?gorgeous-dinner-4055
01/28/2022, 8:34 PMmammoth-lawyer-49919
01/31/2022, 3:18 AMbillions-receptionist-60247
01/31/2022, 10:12 AMpolite-flower-25924
01/31/2022, 1:57 PMMSG_SIZE_TOO_LARGE
Kafka issue? We’re facing this problem while ingesting data from Redshift to DataHub.
---- (full traceback above) ----
File "/usr/local/lib/python3.8/site-packages/datahub/entrypoints.py", line 102, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 174, in wrapper
res = func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 82, in run
pipeline.run()
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 167, in run
self.sink.write_record_async(record_envelope, callback)
File "/usr/local/lib/python3.8/site-packages/datahub/ingestion/sink/datahub_kafka.py", line 83, in write_record_async
self.emitter.emit_mcp_async(
File "/usr/local/lib/python3.8/site-packages/datahub/emitter/kafka_emitter.py", line 152, in emit_mcp_async
producer.produce(
File "/usr/local/lib/python3.8/site-packages/confluent_kafka/serializing_producer.py", line 176, in produce
super(SerializingProducer, self).produce(topic, value, key,
KafkaException: KafkaError{code=MSG_SIZE_TOO_LARGE,val=10,str="Unable to produce message: Broker: Message size too large"}
happy-island-35913
01/31/2022, 4:38 PMancient-apartment-23316
01/31/2022, 8:05 PM{"exceptionClass":"com.linkedin.restli.server.RestLiServiceException","stackTrace":"com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]\n\tat
Also, while installing helm install datahub datahub/datahub
I see the error in this job pod/datahub-datahub-upgrade-job-**** :
ERROR: Cannot connect to GMSat host datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.
java.net.ConnectException: Connection refused (Connection refused)
BUT after this the same job executing one more time w\o errors
pod/datahub-datahub-upgrade-job-*** 0/1 Completed 0 2m8s
pod/datahub-datahub-upgrade-job-*** 0/1 Error 0 3m6s
ancient-apartment-23316
02/01/2022, 12:06 PMkubectl get ing
NAME CLASS HOSTS ADDRESS PORTS AGE
datahub-datahub-frontend <none> <http://datahub.mydomain.com|datahub.mydomain.com> 80 2m37s
few-air-56117
02/01/2022, 3:55 PM'Cannot access field colors on a value with type ARRAY<STRUCT<colors BOOL>> at '
Datahub try to do select hide_product_relations
.colors
but its not possible because without unnest.
'[SQL: SELECT count(*) AS `element_count`, sum(CASE WHEN '(`hide_product_relations`.`colors` IN (NULL) OR `hide_product_relations`.`colors` '
green-intern-1667
02/01/2022, 4:59 PMred-napkin-59945
02/01/2022, 5:51 PMcool-painting-92220
02/01/2022, 6:04 PMDatabaseError: (snowflake.connector.errors.DatabaseError) 390190 (08001): Failed to connect to DB: [our_account_id].<http://snowflakecomputing.com:443|snowflakecomputing.com:443>, The specified authenticator is not accepted by your Snowflake account configuration. Please contact your local system administrator to get the correct URL to use.
(Background on this error at: <http://sqlalche.me/e/13/4xp6>)
We have MFA enabled for Snowflake through Duo Mobile, and was wondering if my troubles had something to do with this. I used to be able to verify my Snowflake login for ingestion jobs through Duo, but can no longer do that - I'm wondering if the ingestion process for DataHub slightly changed and I need a different auth flow. I tried to update my recipe by modifying the parameter for authentication_type
from its default to "EXTERNAL_BROWSER_AUTHENTICATOR"
, and that changed the error a bit:
DatabaseError: (snowflake.connector.errors.DatabaseError) 390190 (08001): Failed to connect to DB: [our_account_id].<http://snowflakecomputing.com:443|snowflakecomputing.com:443>, There was an error related to the SAML Identity Provider account parameter. Contact Snowflake support.
(Background on this error at: <http://sqlalche.me/e/13/4xp6>)
From some searching online about Snowflake authentication troubles, I tried to add my email domain to the end of my Snowflake username, and this yielded an extra notice:
DatabaseError: (snowflake.connector.errors.DatabaseError) 390190 (08001): Failed to connect to DB: [our_account_id].<http://snowflakecomputing.com:443|snowflakecomputing.com:443>, There was an error related to the SAML Identity Provider account parameter. Contact Snowflake support.
(Background on this error at: <http://sqlalche.me/e/13/4xp6>)
Initiating login request with your identity provider. A browser window should have opened for you to complete the login. If you can't see it, check existing browser windows, or your OS settings. Press CTRL+C to abort and try again...
Any thoughts on what might be going on?handsome-football-66174
02/01/2022, 6:24 PMcool-painting-92220
02/01/2022, 10:23 PMTraceback (most recent call last):
File "/home/shivan/data-dict/env/bin/datahub", line 5, in <module>
from datahub.entrypoints import main
File "/home/shivan/data-dict/env/lib64/python3.6/site-packages/datahub/entrypoints.py", line 11, in <module>
from datahub.cli.delete_cli import delete
File "/home/shivan/data-dict/env/lib64/python3.6/site-packages/datahub/cli/delete_cli.py", line 17, in <module>
from datahub.telemetry import telemetry
File "/home/shivan/data-dict/env/lib64/python3.6/site-packages/datahub/telemetry/telemetry.py", line 138, in <module>
telemetry_instance = Telemetry()
File "/home/shivan/data-dict/env/lib64/python3.6/site-packages/datahub/telemetry/telemetry.py", line 37, in __init__
self.update_config()
File "/home/shivan/data-dict/env/lib64/python3.6/site-packages/datahub/telemetry/telemetry.py", line 50, in update_config
with open(CONFIG_FILE, "w") as f:
PermissionError: [Errno 13] Permission denied: '/home/shivan/.datahub/telemetry-config.json'
I am reverted to version 0.8.23 due to MFA issues with my Snowflake ingestion, and this didn't seem to be a problem that appeared a month ago. Any ideas what could be the cause of this?acoustic-wolf-70583
02/02/2022, 4:59 AMrapid-leather-18827
02/02/2022, 5:07 PM