red-pizza-28006
11/10/2021, 3:57 PMplain-farmer-27314
11/10/2021, 9:22 PMwooden-gpu-7761
11/11/2021, 6:33 AMDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python@3.9/3.9.7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 1083, in emit
msg = self.format(record)
File "/usr/local/Cellar/python@3.9/3.9.7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 927, in format
return fmt.format(record)
File "/usr/local/Cellar/python@3.9/3.9.7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 663, in format
record.message = record.getMessage()
File "/usr/local/Cellar/python@3.9/3.9.7/Frameworks/Python.framework/Versions/3.9/lib/python3.9/logging/__init__.py", line 367, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/Users/hyunmin/datahub-recipes/env/bin/datahub", line 8, in <module>
sys.exit(main())
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/datahub/entrypoints.py", line 93, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 58, in run
pipeline.run()
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 141, in run
for wu in self.source.get_workunits():
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/datahub/ingestion/source/sql/bigquery.py", line 207, in get_workunits
self._compute_big_query_lineage()
File "/Users/hyunmin/datahub-recipes/env/lib/python3.9/site-packages/datahub/ingestion/source/sql/bigquery.py", line 121, in _compute_big_query_lineage
logger.error(
Message: 'Error computing lineage information using GCP logs.'
Arguments: (ServiceUnavailable('POST <https://logging.googleapis.com/v2/entries:list?prettyPrint=false>: The service is currently unavailable.'),)
I’ve tried to relax start_time, end_time, and max_query_duration constraints (to almost 10 second intervals) but unfortunately still haven’t seen good results. It seems like the project DataHub is querying against is too big in terms of log size and GCP’s API seems to timeout when returning the logs internally (this was confirmed by GCP’s support team).
Would there be any options I could tweak or anything I’m missing? FYI I’ve tried to call the API manually via curl with smaller page sizes of about 10 and seen better results, but it seems like DataHub’s bigquery ingestion module uses a fixed page size of 1000.
Any ideas would be much appreciated!rhythmic-sundown-12093
11/11/2021, 7:23 AMorange-flag-48535
11/11/2021, 8:00 AMnice-planet-17111
11/11/2021, 9:44 AMred-pizza-28006
11/11/2021, 9:57 AMKafkaException: KafkaError{code=_INVALID_ARG,val=-186,str="Failed to create consumer: Invalid sasl.kerberos.kinit.cmd value: Property not available: "sasl.kerberos.keytab""}
Anyone faced this before? For context, we use Confluent Kafkanice-planet-17111
11/12/2021, 5:41 AMcurl '<http://localhost:8080/corpGroups?action=ingest>' -X POST -H 'X-RestLi-Protocol-Version:2.0.0' --data '{
"snapshot": {
"aspects": [
{
"com.linkedin.identity.CorpGroupInfo":{
"email": "",
"admins": ["urn:li:corpUser:test_user1"],
"members": ["urn:li:corpUser:test_user1", "urn:li:corpUser:test_user2"], "groups": []}}], "urn": "urn:li:corpGroup:dev"}}'
when i changed to /corpGroups?action=ingest
to /entities?action=ingest
, it says "Cannot parse request entity"
creamy-library-6587
11/12/2021, 7:40 AMrough-tent-62538
11/12/2021, 7:50 AMhandsome-belgium-11927
11/12/2021, 10:56 AMdatahub docker quickstart
it is easily dropped by datahub docker nuke
, but how to get the same result if I started it with docker-compose up -d
?lively-jackal-83760
11/12/2021, 11:19 AMbrief-lizard-77958
11/15/2021, 12:58 PMbrief-toothbrush-55766
11/15/2021, 3:07 PMRUN python3 -m pip install --upgrade pip wheel setuptools
RUN python3 -m pip install --upgrade acryl-datahub
RUN datahub version
brief-toothbrush-55766
11/15/2021, 10:10 PM'error': 'Unable to emit metadata to DataHub GMS',
'info': {'exceptionClass': 'com.linkedin.restli.server.RestLiServiceException',
'message': "No root resource defined for path '/datasets'",
'stackTrace': 'com.linkedin.restli.server.RestLiServiceException [HTTP Status:404]: No root resource defined for path '
"'/datasets'\n"
billions-tent-29367
11/15/2021, 10:34 PM=========================== short test summary info ============================
FAILED tests/unit/test_airflow.py::test_lineage_backend[airflow-1-10-x-decl]
FAILED tests/unit/test_airflow.py::test_lineage_backend[airflow-2-x-decl] - a...
========== 2 failed, 141 passed, 14 deselected, 2 warnings in 14.86s ===========
better-orange-49102
11/16/2021, 3:43 AMpolite-flower-25924
11/16/2021, 6:11 AMvictorious-dream-46349
11/16/2021, 11:03 AMred-pizza-28006
11/16/2021, 1:47 PMnice-planet-17111
11/17/2021, 5:27 AMfull-area-6720
11/17/2021, 6:10 AMrhythmic-sundown-12093
11/17/2021, 6:16 AMsource:
type: mysql
config:
env: "Stage"
host_port: xxx
database: yyy
# Credentials
username: zzzz
password: 'xxxxxx'
schema_pattern:
allow: ["AAAA"]
sink:
type: "datahub-rest"
config:
server: '<http://localhost:9003>'
log:
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.linkedin.metadata.dao.utils.RecordUtils.invokeProtectedMethod(RecordUtils.java:370)
at com.linkedin.metadata.dao.utils.RecordUtils.getRecordTemplateField(RecordUtils.java:289)
at com.linkedin.metadata.dao.utils.ModelUtils.getUrnFromSnapshot(ModelUtils.java:128)
at com.linkedin.metadata.entity.EntityService.ingestSnapshotUnion(EntityService.java:377)
at com.linkedin.metadata.entity.EntityService.ingestEntity(EntityService.java:312)
at com.linkedin.metadata.resources.entity.EntityResource.lambda$ingest$4(EntityResource.java:183)
at com.linkedin.metadata.restli.RestliUtil.toTask(RestliUtil.java:30)
... 81 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.linkedin.metadata.dao.utils.RecordUtils.invokeProtectedMethod(RecordUtils.java:368)
... 87 more
Caused by: com.linkedin.data.template.TemplateOutputCastException: Invalid URN syntax: Invalid URN Parameter: 'No enum constant com.linkedin.common.FabricType.Stage: urn:li:dataset:(urn:li:dataPlatform:mysql,AAAA.BBBBBB,Stage)
at com.linkedin.common.urn.DatasetUrn$1.coerceOutput(DatasetUrn.java:78)
at com.linkedin.common.urn.DatasetUrn$1.coerceOutput(DatasetUrn.java:69)
at com.linkedin.data.template.DataTemplateUtil.coerceOutput(DataTemplateUtil.java:954)
at com.linkedin.data.template.RecordTemplate.obtainCustomType(RecordTemplate.java:365)
... 91 more
Caused by: java.net.URISyntaxException: Invalid URN Parameter: 'No enum constant com.linkedin.common.FabricType.Stage: urn:li:dataset:(urn:li:dataPlatform:mysql,AAAA.BBBBBB,Stage)
at com.linkedin.common.urn.DatasetUrn.createFromUrn(DatasetUrn.java:55)
at com.linkedin.common.urn.DatasetUrn.createFromString(DatasetUrn.java:38)
at com.linkedin.common.urn.DatasetUrn$1.coerceOutput(DatasetUrn.java:76)
... 94 more
nice-planet-17111
11/17/2021, 6:22 AMprod/mysql/{schema_name}/{schema_name}.{table_name}
. But this has some problems..
1. it does not represent instance name - therefore make it hard to search
2. When i ingest from multiple instances, all the schemas will be under the same mysql.
Is there a way to set path like prod/mysql/{instance_nam}/{schema_name}/{table_name} or something like that?orange-flag-48535
11/17/2021, 7:28 AMcreamy-library-6587
11/17/2021, 9:52 AMboundless-scientist-520
11/17/2021, 10:46 AMsource:
type: "superset"
config:
username: xxxx
password: xxxx
provider: db
connect_uri: <http://supersetxxxxxx.com>
env: "DEV"
sink:
type: "datahub-rest"
config:
server: "<http://datahub-datahub-gms.datahub.svc.cluster.local:8080>"
The ingestion was successful. I can see in Datahub the charts, dashboard and lineage. But in the category of "Datasets" datasets do not appear, although I can see them in the lineage (attached image).
How can I see these datasets? Do I need any configuration in the recipe?
Thanks for any help.better-orange-49102
11/18/2021, 2:02 AMnutritious-train-7865
11/18/2021, 2:19 PMsource:
type: trino
config:
# Coordinates
host_port: xxxx
database: xxxx
# Credentials
username: xxxx
password: xxxx
sink:
type: "file"
config:
filename: "./example_output_mces.json"
I was getting this error:
DBAPIError: (trino.exceptions.FailedToObtainAddedPrepareHeader)
[SQL: SELECT "table_name"
FROM "information_schema"."tables"
WHERE "table_schema" = ? and "table_type" != 'VIEW']
Can anybody help me with it?clean-crayon-15379
11/18/2021, 6:13 PM