full-computer-98125
06/15/2023, 4:34 PM2023-06-15 15:02:33,033 [kafka-coordinator-heartbeat-thread | generic-mae-consumer-job-client] INFO o.a.k.c.c.i.AbstractCoordinator:979 - [Consumer clientId=consumer-generic-mae-consumer-job-client-5, groupId=generic-mae-consumer-job-client] Member consumer-generic-mae-consumer-job-client-5-b17bdbb2-720c-4813-9e33-6ad46574892c sending LeaveGroup request to coordinator "coordinator" (id: 2147483646 rack: null) due to consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured <http://max.poll.interval.ms|max.poll.interval.ms>, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing <http://max.poll.interval.ms|max.poll.interval.ms> or by reducing the maximum size of batches returned in poll() with max.poll.records
Reading the docs I see it may be possible to set this config via spring boot. I tried adding
- name: SPRING_KAFKA_PROPERTIES_CONSUMER_MAX_POLL_RECORDS
value: "10"
as the env var to configure it, but I receive:
2023-06-15 16:21:15,835 [main] WARN o.a.k.c.consumer.ConsumerConfig:355 - The configuration 'consumer.max.poll.records' was supplied but isn't a known config.
Does anyone here know the proper way to adjust that config value? spring reference hereadorable-lawyer-88494
06/16/2023, 6:22 AMThe :li-utils:compileMainGeneratedDataTemplateJava task failed.
where i was thinking it is coming because of Pegasus so can anyone please tell me that
does Pegasus latest version support java-17?if yes then which Versionmelodic-lighter-39433
06/16/2023, 10:04 AMlemon-yacht-62789
06/16/2023, 10:06 AMv0.10.1
and am having some difficulties setting up a Looker ingestion source. Our ingestion has started failing and I assumed this might be down to an outdated config, so have tried setting up a new connection from scratch via the UI.
When entering the base URL, client id and secret I am able to validate the connection OK - all ticks are returned green.
However, when actually triggering the pipeline the following error appears in the log which seems to indicate it's the API version at issue:
PipelineInitError: Failed to configure the source (looker): Failed to connect/authenticate with looker - check your configuration: b'{"message":"API 3.x requests are prohibited. Request: POST /api/3.1/login","documentation_url":"<https://cloud.google.com/looker/docs/>"}'
Datahub v0.10.1
release notes indicate support for the v4 Looker API, so I'm wondering if it's perhaps the credentials 🤔 As in, these were originally generated for a v3 Looker connection so my theory is I need to generate new credentials for the v4 API. I do not have admin access to Looker in our organisation, so am unable to test this theory out yet. I am curious if anyone has had any similar issues.freezing-oxygen-20989
06/16/2023, 10:15 AMdatahub-elasticsearch-setup-job
, but then datahub-system-update-job
fails with the following error message:
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [HOSTNAME], URI [/datahubpolicyindex_v2/_search?typed_keys=true&max_concurrent_shard_requests=5&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&ignore_throttled=true&search_type=query_then_fetch&batched_reduce_size=512&ccs_minimize_roundtrips=true], status line [HTTP/1.1 404 Not Found]
DataHub version: v0.10.4
OpenSearch version: OpenSearch 2.5
Has anyone came across similar issues before?
Thanksswift-dream-78272
06/16/2023, 12:14 PM503
error code. My api query looks like below and I’d tried to paginate using start
and count
parameters but at some point it also throws me 503
. Ideally I’d want to not pass a query parameter but even if I narrow it down to snowflake platform, I cannot get all datasets urns.
{
searchAcrossEntities(
input: {types: DATASET, query: "snowflake", start: 0, count: 5000}
) {
start
count
total
searchResults {
entity {
urn
}
}
}
}
API error response:
{
"servlet": "apiServlet",
"message": "Service Unavailable",
"url": "/api/graphql",
"status": "503"
}
DataHub version: 0.9.6.1
handsome-football-66174
06/16/2023, 7:39 PMorg.apache.kafka.common.errors.SerializationException: Error serializing Avro message
Caused by: javax.net.ssl.SSLHandshakeException: No subject alternative DNS name matching <schema-registry URL> found.
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[na:na]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:353) ~[na:na]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:296) ~[na:na]
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:291) ~[na:na]
flat-table-17463
06/17/2023, 5:34 PMfor tag in graph.get_urns_by_filter(entity_types=["dataset"],query="reserved"):
print(tag)
outputs :
urn:li:dataset:(urn:li:dataPlatform:postgres,customerservice.public.customer_reserved,PROD)
urn:li:dataset:(urn:li:dataPlatform:postgres,accountservice.public.account_blocked,PROD)
urn:li:dataset:(urn:li:dataPlatform:postgres,accountservice.public.account_blocked_transaction,PROD)
why this result contains name that "blocked"?average-nail-72662
06/17/2023, 10:11 PMbetter-sunset-65466
06/19/2023, 9:04 AMjolly-tent-78213
06/19/2023, 12:25 PMdatahubpolicyindex_v2
(maybe it isn't the role of the job to create indexes). So the indexes listed here hasn't been created in my ES. You can see attached image as well.
• The 2nd one is that the pod GMS is failing at querying ES. By querying the datahub_usage_event
index GMS is failling. Preventing the pod to enter in READY state.
I added as well the logs of the GMS pod.
I would be happy to have some help about my issue.incalculable-portugal-45517
06/19/2023, 11:59 PM"events_produced": "0",
with no assets ingested, using version 0.9.3nutritious-salesclerk-57675
06/20/2023, 5:25 AM[2023-06-20, 04:03:24 UTC] {pod_manager.py:197} INFO - ERROR:root:('Unable to get metadata from DataHub', {'message': '401 Client Error: Unauthorized for url: https://<url-to-gms>/aspects?action=getTimeseriesAspectValues'})
Is this something related to permissions? Can someone help understand the cause of this error?
PS: I have token based authentication enabled.enough-football-92033
06/20/2023, 11:09 AMdatahub-ingestion-base
image build I started to get next error:
Package 'openjdk-11-jre-headless' has no installation candidate
Details:
#8 4.213 E: Package 'openjdk-11-jre-headless' has no installation candidate
183
#8 ERROR: executor failed running [/bin/sh -c apt-get update && apt-get install -y && apt-get install -y -qq make python3-ldap libldap2-dev libsasl2-dev libsasl2-modules libaio1 libsasl2-modules-gssapi-mit krb5-user wget zip unzip ldap-utils openjdk-11-jre-headless && python -m pip install --upgrade pip wheel setuptools==57.5.0 && python -m pip install --upgrade awscli && curl -Lk -o /root/librdkafka-${LIBRDKAFKA_VERSION}.tar.gz <https://github.com/edenhill/librdkafka/archive/v${LIBRDKAFKA_VERSION}.tar.gz> && tar -xzf /root/librdkafka-${LIBRDKAFKA_VERSION}.tar.gz -C /root && cd /root/librdkafka-${LIBRDKAFKA_VERSION} && ./configure --prefix /usr && make && make install && make clean && ./configure --clean && apt-get remove -y make]: exit code: 100
I was no changes from my side in the code base, can help me resolve it?dazzling-airport-31275
06/20/2023, 11:35 AMsalmon-exabyte-77928
06/20/2023, 12:35 PMadorable-airline-30358
06/20/2023, 12:45 PMSorry, we are unable to find this entity in DataHub
which is expected.better-sunset-65466
06/20/2023, 1:47 PMtransformers:
type: simple_add_dataset_domain
config:
replace_existing: true
domains:
- 'urn:li:domain:data_observatory'
I keep on getting this error:
~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': 'edebb278-baf5-4497-aac6-73d520af6af9',
'infos': ['2023-06-20 13:44:44.482415 INFO: Starting execution for task with name=RUN_INGEST',
"2023-06-20 13:44:48.557174 INFO: Failed to execute 'datahub ingest'",
'2023-06-20 13:44:48.557409 INFO: Caught exception EXECUTING task_id=edebb278-baf5-4497-aac6-73d520af6af9, name=RUN_INGEST, '
'stacktrace=Traceback (most recent call last):\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 122, in execute_task\n'
' task_event_loop.run_until_complete(task_future)\n'
' File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
' return future.result()\n'
' File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 231, in execute\n'
' raise TaskError("Failed to execute \'datahub ingest\'")\n'
"acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
'errors': []}
~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv setup time = 0
This version of datahub supports report-to functionality
datahub ingest run -c /tmp/datahub/ingest/edebb278-baf5-4497-aac6-73d520af6af9/recipe.yml --report-to /tmp/datahub/ingest/edebb278-baf5-4497-aac6-73d520af6af9/ingestion_report.json
[2023-06-20 13:44:46,739] INFO {datahub.cli.ingest_cli:173} - DataHub CLI version: 0.10.4
1 validation error for PipelineConfig
transformers
value is not a valid list (type=type_error.list)
purple-forest-88570
06/21/2023, 4:02 AMTest 1
I sent 100 search requests to GMS using Jmeter.
It took 4 seconds.
When monitoring the thread pool of ElasticSearch, max 2 active threads were observed.
Test 2
I sent 100 search requests to ElasticSearch using Jmeter. the search request was same one sent by GMS in Test 1,
It took 0.2 seconds.
When monitoring the thread pool of ElasticSearch, max 15 active threads were observed.Based on these results, it seems that GMS is only sending search requests to ElasticSearch in batches of 2. I also checked the connection between GMS and ElasticSearch using tcpdump and netstat, and found that they are only connected through 2 ports. Could you please provide any advice or suggestions regarding this issue? Thank you.
better-gigabyte-38217
06/21/2023, 6:29 AMacoustic-quill-54426
06/21/2023, 10:12 AMdataHubRetentionConfig
and dataHubRetentionKey
correctly created in the db, but after restarting the gms containers, we still have thousands of aspects that should have been deletedcolossal-waitress-83487
06/21/2023, 10:49 AMcuddly-dinner-641
06/21/2023, 12:55 PMadamant-furniture-37835
06/21/2023, 3:04 PMshy-dog-84302
06/21/2023, 3:10 PMimportant-minister-98629
06/21/2023, 6:37 PMworried-solstice-95319
06/21/2023, 8:31 PMquaint-belgium-35390
06/22/2023, 2:48 AMValidation
tab in datahub,
but there are some errors that make the datahubvalidationaction failed
errors
Sql parser failed on {query} with daemonic processes are not allowed to have children
this is my requirements.
acryl-datahub[great-expectations]==0.10.3.2
acryl-datahub-airflow-plugin==0.10.3.2
great-expectations==0.15.41
airflow-provider-great-expectations==0.2.6
this is my action list
"action_list": [
{
"name": "datahub_action",
"action": {
"module_name": "datahub.integrations.great_expectations.action",
"class_name": "DataHubValidationAction",
"server_url": "<http://host_IP:9002/>",
"parse_table_names_from_sql": True,
"retry_max_times": 1,
"graceful_exceptions": False,
"env": "STG",
},
},
if any of you had encounter this issue, please help me solve this, thank youhelpful-student-10263
06/22/2023, 7:16 AMas is)
...
<New id="httpConfig" class="org.eclipse.jetty.server.HttpConfiguration">
<Set name="requestHeaderSize"><Property name="jetty.httpConfig.requestHeaderSize" deprecated="jetty.request.header.size" default="16384" /></Set>
</New>
...
to be)
...
<New id="httpConfig" class="org.eclipse.jetty.server.HttpConfiguration">
<Set name="requestHeaderSize"><Property name="jetty.httpConfig.requestHeaderSize" deprecated="jetty.request.header.size" default="16384" /></Set>
<Set name="sendDateHeader"><Property name="jetty.httpConfig.sendDateHeader" deprecated="jetty.send.date.header" default="false" /></Set>
</New>
...
but, How to apply this using helm chart?ancient-policeman-73437
06/22/2023, 8:22 AM