better-orange-49102
03/10/2022, 10:18 AMbrief-toothbrush-55766
03/10/2022, 10:43 AMnutritious-bird-77396
03/10/2022, 4:15 PMbillowy-rocket-47022
03/10/2022, 5:36 PM[9:06 AM] 22/03/10 09:05:18 ERROR Schema: Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@5c09afbc, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
shy-parrot-64120
03/10/2022, 6:47 PMyaml-anchors
? like this:
version: 1
lineage:
- entity: &dataset
name: report.payment_reconciliation
type: dataset
platform: postgres
platform_instance: mvp
upstream:
- entity: &datajob
name: report.load_payment_reconciliation
type: datajob
platform: postgres
platform_instance: mvp
- entity:
<<: *datajob
name: report.load_payment_reconciliation
upstream:
- entity:
<<: *dataset
name: core.payment
- entity:
<<: *dataset
name: core.ph2_transaction
- entity:
<<: *dataset
name: core.ph2_order
afaiks answer is no
have you any plans to do like this?better-orange-49102
03/11/2022, 2:54 AMfierce-waiter-13795
03/11/2022, 7:01 AMmysterious-australia-30101
03/11/2022, 9:48 AMmysterious-nail-70388
03/11/2022, 9:51 AMbrief-toothbrush-55766
03/11/2022, 11:26 AMcareful-insurance-60247
03/13/2022, 2:55 PMsalmon-rose-54694
03/14/2022, 1:40 AMgreen-pencil-45127
03/14/2022, 1:49 PMtag
or meta
property inside our dbt documentation into DataHub. After reviewing the example recipe, it looks more like the command is to Do X if Y is detected
. While this makes sense for known tags (like PII), ideally we would send all tags from dbt to DataHub without any predefined knowledge or recipe. Any ideas on the syntax to do that?plain-farmer-27314
03/15/2022, 3:18 PMMaterialized view
tables. I have include_views set to true
and the dataset/table pattern in my allow config.
Views
are successfully picked up fwiwhandsome-football-66174
03/15/2022, 4:53 PM[2022-03-15 15:59:16,848] {logging_mixin.py:104} INFO - Pipeline config is {'source': {'type': 'glue', 'config': {'env': 'PROD', 'aws_region': 'us-east-1', 'extract_transforms': 'false', 'table_pattern': {'allow': ['testdb.*'], 'ignoreCase': 'false'}}}, 'transformers': [{'type': 'simple_remove_dataset_ownership', 'config': {}}, {'type': 'simple_add_dataset_ownership', 'config': {'owner_urns': ['urn:li:corpuser:user1']}}, {'type': 'set_dataset_browse_path', 'config': {'path_templates': ['/Platform/PLATFORM/DATASET_PARTS']}}], 'sink': {'type': 'datahub-kafka', 'config': {'connection': {'bootstrap': 'bootstrapserver:9092', 'schema_registry_url': '<https://schemaregistryurl>'}}}}
[2022-03-15 16:05:46,022] {pipeline.py:85} ERROR - failed to write record with workunit testdb.person_era with KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"} and info {'error': KafkaError{code=_MSG_TIMED_OUT,val=-192,str="Local: Message timed out"}, 'msg': <cimpl.Message object at 0x7f0863603560>}
[2022-03-15 16:05:46,078] {taskinstance.py:1482} ERROR - Task failed with exception
Traceback (most recent call last):
gifted-queen-80042
03/15/2022, 5:52 PMprofiling.limit
configuration for SQL profiling.
• Scenario 1: Without this config parameter, the profiling runs successfully.
• Scenario 2: However, upon introducing this to say 20 rows, I run into Operational Error:
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1044, "Access denied for user '<username>'@'%' to database '<database_name>'")
[SQL: CREATE TEMPORARY TABLE ge_temp_<temp_table> AS SELECT *
FROM <table_name>
LIMIT 20]
(Background on this error at: <http://sqlalche.me/e/13/e3q8>)
My question is more in terms of how this parameter is implemented. Given that in both the scenarios above it runs a SELECT
query, why does LIMIT
result in access denied error but without LIMIT
, there's no error?lemon-terabyte-66903
03/15/2022, 7:28 PMplain-farmer-27314
03/15/2022, 8:47 PMprehistoric-optician-40107
03/16/2022, 11:36 AM"ConnectionError: HTTPConnectionPool(host='localhost', port=8080): Max retries exceeded with url: /config (Caused by "
"NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb58e5d14f0>: Failed to establish a new connection: [Errno 111] "
"Connection refused'))\n",
"2022-03-16 11:30:30.325290 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Failed to execute 'datahub ingest'",
'2022-03-16 11:30:30.325765 [exec_id=d287226a-592b-4029-879a-583a3cfa64eb] INFO: Caught exception EXECUTING '
'task_id=d287226a-592b-4029-879a-583a3cfa64eb, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/default_executor.py", line 119, in execute_task\n'
' self.event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.9/site-packages/nest_asyncio.py", line 81, in run_until_complete\n'
' return f.result()\n'
' File "/usr/local/lib/python3.9/asyncio/futures.py", line 201, in result\n'
' raise self._exception\n'
' File "/usr/local/lib/python3.9/asyncio/tasks.py", line 256, in __step\n'
' result = coro.send(None)\n'
' File "/usr/local/lib/python3.9/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 115, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"]}
Execution finished with errors.
brave-secretary-27487
03/16/2022, 12:59 PMdamp-queen-61493
03/16/2022, 1:00 PM[2022-03-16, 12:39:38 UTC] {base.py:79} INFO - Using connection to: id: datahub_kafka_default. Host: prerequisites-kafka.datahub-prereqs-prod.svc.cluster.local:9092, Port: None, Schema: , Login: ***, Password: ***, extra: {'schema_registry_url': '<http://prerequisites-cp-schema-registry.datahub-prereqs-prod.svc.cluster.local:8081>'}
[2022-03-16, 12:39:38 UTC] {base.py:79} INFO - Using connection to: id: datahub_kafka_default. Host: prerequisites-kafka.datahub-prereqs-prod.svc.cluster.local:9092, Port: None, Schema: , Login: ***, Password: ***, extra: {'schema_registry_url': '<http://prerequisites-cp-schema-registry.datahub-prereqs-prod.svc.cluster.local:8081>'}
[2022-03-16, 12:39:38 UTC] {datahub.py:122} ERROR - 1 validation error for KafkaSinkConfig
schema_registry_url
extra fields not permitted (type=value_error.extra)
And without extra this error:
[2022-03-16, 12:58:42 UTC] {base.py:79} INFO - Using connection to: id: datahub_kafka_default. Host: prerequisites-kafka.datahub-prereqs-prod.svc.cluster.local:9092, Port: None, Schema: , Login: ***, Password: ***, extra: {}
[2022-03-16, 12:58:42 UTC] {base.py:79} INFO - Using connection to: id: datahub_kafka_default. Host: prerequisites-kafka.datahub-prereqs-prod.svc.cluster.local:9092, Port: None, Schema: , Login: ***, Password: ***, extra: {}
[2022-03-16, 12:58:42 UTC] {datahub.py:122} ERROR - KafkaError{code=_VALUE_SERIALIZATION,val=-161,str="HTTPConnectionPool(host='localhost', port=8081): Max retries exceeded with url: /subjects/MetadataChangeEvent_v4-value/versions (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff97e544a50>: Failed to establish a new connection: [Errno 111] Connection refused'))"}
[2022-03-16, 12:58:42 UTC] {datahub.py:123} INFO - Supressing error because graceful_exceptions is set
So, how is the proper way to configure it?eager-florist-67924
03/16/2022, 11:27 PMMetadataChangeProposalWrapper mcpw = MetadataChangeProposalWrapper.builder()
.entityType("dataflow")
.entityUrn("urn:li:dataflow:(urn:li:dataPlatform:kafka,trace-pipeline,PROD)")
.upsert()
.aspect(new DataFlowInfo()
.setName("Trace pipeline")
.setDescription("Pipeline for trace service")
)
.build();
i am able to successfully emit it
emitter.emit(mcpw, new Callback()
but then when executing graphql query:
graphql query
{
search(input: { type: DATA_FLOW, query: "*", start: 0, count: 10 }) {
start
count
total
searchResults {
entity{
urn
type
...on DataFlow {
cluster
}
}
}
}
}
i get following error:
response
{
"errors": [
{
"message": "The field at path '/search/searchResults[0]/entity' was declared as a non null type, but the code involved in retrieving data has wrongly returned a null value. The graphql specification requires that the parent field be set to null, or if that is non nullable that it bubble up null to its parent and so on. The non-nullable type is 'Entity' within parent type 'SearchResult'",
"path": [
"search",
"searchResults",
0,
"entity"
],
"extensions": {
"classification": "NullValueInNonNullableField"
}
}
],
"data": {
"search": null
}
}
so basically how such dataflow entity should look like? Did i miss some required fields? And how from entities documentation i can know which fields are optional and which are mandatory? thxbillowy-book-26360
03/17/2022, 1:20 AMValueError: ('# Detailed Table Information', None, None) is not in list
? I encounter this for all tables, but all database names are ingested fine.stale-jewelry-2440
03/17/2022, 1:08 PM[2022-03-17, 13:35:39 CET] {local_task_job.py:154} INFO - Task exited with return code Negsignal.SIGKILL
Note that the part GE - Airflow works good, i.e. if I deactivate the action of sending stuff to DataHub everything works fine.
I also set the logging level to debug, but nothing interesting is printed out.
Any hint?miniature-hair-20451
03/17/2022, 2:47 PMdatahub ingest -c hive_2_datahub.yml
cat hive_2_datahub.yml
source:
type: hive
config:
host_port: <http://rnd-dwh-nn-002.msk.mts.ru:10010|rnd-dwh-nn-002.msk.mts.ru:10010>
database: digital_dm
username: aaplato9
options.connect_args: 'KERBEROS'
sink:
type: "console"
Error
Error:
1 validation error for HiveConfig
options.auth
extra fields not permitted (type=value_error.extra)
high-family-71209
03/18/2022, 12:16 PMswift-breakfast-25077
03/18/2022, 1:03 PMgreen-pencil-45127
03/18/2022, 1:37 PMsources.json
, it seems like it might be a cloud-only feature. Can anyone confirm that this is the case?thankful-glass-88027
03/18/2022, 3:41 PM# python3 -m pip install 'acryl-datahub[sqlalchemy]'
# python3 -m pip install sqlalchemy-vertica-python
Build and ingest Yaml
• vertica_ingest.yaml
Source:
type: sqlalchemy
config:
platform: vertica
connect_uri: 'vertica+vertica_<python://datahub_user:password@1.1.1.1:5433/verticadb>'
sink:
type: datahub-rest
config:
server: 'http:// 1.1.1.1:8080'
To ingest via CLI:
datahub ingest -c vertica_ingest.yaml
Could the Vertica Dialect for SQL Alchemy could be added to the offical image :)?adamant-laptop-28839
03/18/2022, 6:01 PMsource:
type: mssql
config:
uname,pas,port
database: db_name
database_alias: db_alias
but doesn't use the database_alias like db_alias.dbo.table its still using the database name db_name.dbo.table
can anyone help me how to fix this? thank you!!