proud-baker-56489
07/14/2022, 9:48 AMicy-portugal-26250
07/14/2022, 10:44 AM/api/graphiql
endpoint. A query returned a response about an hour ago, but now when rerunning the query I get a
{
"errors": {
"message": "Response.text: Body has already been consumed.",
"stack": "graphQLFetcher/</</<@https://datahub.wolt.com/api/graphiql:57:33\n"
}
}
Is there a way to fetch this response again?quick-pizza-8906
07/14/2022, 2:06 PMdisable_dbt_node_creation
set to True - I can see nice lineage between preingested Snowflake tables but on the main page where all platforms are shown I can see DBT platform with count of several thousand elements. If I click on this platform to see entities I got an exception. After some examination of mysql database I could see there are objects with urn like urn:li:assertion:2c8a2605354d9b924c0f1b5d9f0dffd5
with dataPlatformInstance apsect having dbt
as platform but nothing as an instance (I believe exception was coming from that aspect missing platform instance).
2. If I run with disable_dbt_node_creation
set to False - I can see lineage and dbt objects combined with Snowflake tables (very cool). It seems I still have above assertions but they don't cause problems on platform search anymore.
In either case if I run connector with stateful_ingestion
enabled I end up with connector ingesting data but then throwing an exception ending with code like below:
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 35, in _get_lightweight_repr
31 def _get_lightweight_repr(dataset_urn: str) -> str:
32 """Reduces the amount of text in the URNs for smaller state footprint."""
33 SEP = BaseSQLAlchemyCheckpointState._get_separator()
34 key = dataset_urn_to_key(dataset_urn)
--> 35 assert key is not None
36 return f"{key.platform}{SEP}{key.name}{SEP}{key.origin}"
..................................................
dataset_urn = 'urn:li:assertion:2c8aaaa5354d9b924c0f1b5c9f09bf75'
SEP = '||'
key = None
Which makes me think urn representation function fails for assertion objects which are considered to be datasets somehow? Anyone having similar problems?best-lamp-53937
07/14/2022, 2:22 PMprehistoric-yak-55672
07/14/2022, 8:41 PMdatahub docker quickstart
It returns the following error:
---- (full traceback above) ----
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\entrypoints.py", line 149, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1055, in main
rv = self.invoke(ctx)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\click\core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\upgrade\upgrade.py", line 322, in wrapper
res = func(*args, **kwargs)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\telemetry\telemetry.py", line 338, in wrapper
raise e
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\telemetry\telemetry.py", line 290, in wrapper
res = func(*args, **kwargs)
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\cli\docker.py", line 322, in quickstart
default_quickstart_compose_file = _get_default_quickstart_compose_file()
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\cli\docker.py", line 162, in _get_default_quickstart_compose_file
home = os.environ["HOME"]
File "c:\users\wohar\appdata\local\programs\python\python37-32\lib\os.py", line 681, in __getitem__
raise KeyError(key) from None
KeyError: 'HOME'
[2022-07-14 17:36:08,451] INFO {datahub.entrypoints:188} - DataHub CLI version: 0.8.40.3 at c:\users\wohar\appdata\local\programs\python\python37-32\lib\site-packages\datahub\__init__.py
[2022-07-14 17:36:08,451] INFO {datahub.entrypoints:191} - Python version: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:01:55) [MSC v.1900 32 bit (Intel)] at c:\users\wohar\appdata\local\programs\python\python37-32\python.exe on Windows-10-10.0.22000-SP0
[2022-07-14 17:36:08,451] INFO {datahub.entrypoints:193} - GMS config {}
Does anyone knows what might be happening?flat-window-44654
07/14/2022, 10:51 PMSearchAcrossEntities
endpoint trying to return only results for DASHBOARDS
and DATASETS
. However, when I submit the following query (see 🧵) with both types, I only get back DATASETS
, even though I know there are DASHBOARDS
that match my search query. Could there be a bug in the API or am I missing something? 🤔adamant-van-21355
07/15/2022, 7:48 AMbetter-spoon-77762
07/15/2022, 8:05 PMdelightful-barista-90363
07/15/2022, 10:30 PMDatahubSparkListener: java.lang.NullPointerException: Cannot invoke "java.util.Map.put(Object, Object)" because the return value of "java.util.Map.get(Object)" is null
was wondering if i could get some assistance?
Stacktrace(s) in thread.
Thanks for the help in advancedmost-nightfall-36645
07/18/2022, 8:51 AMv0.8.41
my frontend and gms containers error with:
Error: secret "datahub-auth-secrets" not found
How do I create this secret from the datahub helm chart (e.g. which pod/container creates the secret).purple-analyst-83660
07/18/2022, 10:21 AMagreeable-belgium-70840
07/18/2022, 10:27 AM{data: {createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"}, extensions: {}}
data: {createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"}
createGroup: "urn:li:corpGroup:4404d005-a2f6-491f-8b4d-931c7063ea0a"
extensions: {}
Any ideas?square-hair-99480
07/18/2022, 4:08 PMfaint-translator-23365
07/18/2022, 8:01 PMrhythmic-stone-77840
07/19/2022, 12:42 AMclean-tomato-22549
07/19/2022, 4:59 AMicy-portugal-26250
07/19/2022, 7:19 AMv0.8.41
witty-butcher-82399
07/19/2022, 12:52 PM│ PermissionDenied: 403 request failed: the user does not have 'bigquery.readsessions.create' permission for 'projects/XXXXXXXX'
According to the docs, that permission is required only for lineage.
So I tried by disabling table lineage with: include_table_lineage: False
However, still getting the same error. Is there any other config setting for disabling the table lineage? or is this a bug in the config field?
🧵bland-orange-13353
07/19/2022, 4:04 PMdelightful-barista-90363
07/19/2022, 10:12 PMhallowed-dog-79615
07/20/2022, 8:00 AMmicroscopic-mechanic-13766
07/20/2022, 8:17 AMuid=101(datahub) gid=101(datahub) groups=101(datahub)
but in the datahub-front-end is uid=100(datahub) gid=101(datahub) groups=101(datahub)
.
Is this done on purpose or is it just a mistake??
Thanks in advance for the help!steep-soccer-91284
07/20/2022, 9:24 AMbest-leather-7441
07/20/2022, 1:03 PMlemon-engine-23512
07/20/2022, 2:00 PMprehistoric-yak-55672
07/20/2022, 4:49 PMshy-parrot-64120
07/20/2022, 5:55 PMbland-orange-13353
07/20/2022, 6:06 PMbig-ocean-9800
07/20/2022, 6:12 PMv0.8.38
and we have about 7k data assets loaded. We are seeing a pattern where loading the home page is extremely slow (on the order of 5-10 seconds).
I checked metrics around our datahub infrastructure and everything was running at about 10-20% utilization. Our elastic search cluster is at low utilization, their disks are less than 10% utilized, and I don’t see any IO throttling from our cloud provider. Same story with our Postgres instance.
I took a look at the calls that hang the longest on the home page and the consistently slow call is the graphql call searchAcrossEntities
. By taking a cursory look through the code, I can see that it seems to interact with just elastic search.
I’m here wondering if anyone has experienced a similar behavior, any troubleshooting tips, etc. Is this expected performance with the number of assets we have? Are there any changes we can make to our elastic cluster to help alleviate these problems?
I took a look through the slack history through this channel and couldn’t quite find any messages which seem similar (same with github issues both open and closed).
Please let me know if any more information would be helpful. Cheers!ambitious-cartoon-15344
07/21/2022, 8:16 AM