agreeable-belgium-70840
03/09/2023, 9:35 AM2023-03-09 09:29:44,122 [I/O dispatcher 1] INFO c.l.m.s.e.update.BulkListener:47 - Successfully fed bulk request. Number of events: 1 Took time ms: -1
2023-03-09 09:30:23,729 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at java.base/java.lang.Thread.run(Thread.java:829)
Any ideas?gentle-camera-33498
03/09/2023, 4:27 PMagreeable-belgium-70840
03/09/2023, 4:27 PMgentle-camera-33498
03/09/2023, 4:29 PMagreeable-belgium-70840
03/09/2023, 4:29 PMagreeable-belgium-70840
03/09/2023, 4:30 PMagreeable-belgium-70840
03/09/2023, 4:30 PMgentle-camera-33498
03/09/2023, 4:31 PMagreeable-belgium-70840
03/09/2023, 4:31 PMagreeable-belgium-70840
03/09/2023, 4:31 PMgentle-camera-33498
03/09/2023, 4:34 PMagreeable-belgium-70840
03/09/2023, 4:34 PMgentle-camera-33498
03/09/2023, 4:35 PMagreeable-belgium-70840
03/09/2023, 4:36 PMgentle-camera-33498
03/09/2023, 4:38 PMagreeable-belgium-70840
03/09/2023, 4:40 PMastonishing-answer-96712
03/09/2023, 8:17 PMbig-carpet-38439
03/09/2023, 10:04 PMdatahub-upgrade
container. It will perform an upgrade on your system that will allow the rest of the system components to update!agreeable-belgium-70840
03/13/2023, 3:32 PMagreeable-belgium-70840
03/13/2023, 3:33 PMagreeable-belgium-70840
03/13/2023, 3:33 PMagreeable-belgium-70840
03/13/2023, 3:33 PMagreeable-belgium-70840
03/13/2023, 3:33 PMagreeable-belgium-70840
03/13/2023, 3:33 PMastonishing-answer-96712
03/13/2023, 5:57 PMagreeable-belgium-70840
03/14/2023, 9:02 AMaloof-laptop-71927
03/15/2023, 5:43 PM2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
This is what I did:
1 - Ran elasticsearch-setup
2 - Ran kafka-setup
3 - Ran datahub-upgrade -u SystemUpdate
4 - Error when trying to start datahub-gms
PS: I have a self-hosted ES and Kafka.
Any ideas?
Thanks!gentle-camera-33498
03/15/2023, 5:46 PMaloof-laptop-71927
03/15/2023, 5:49 PMgentle-camera-33498
03/15/2023, 5:52 PMaloof-laptop-71927
03/15/2023, 5:53 PMaloof-laptop-71927
03/15/2023, 5:54 PMgentle-camera-33498
03/15/2023, 5:55 PMaloof-laptop-71927
03/15/2023, 5:56 PMgentle-camera-33498
03/15/2023, 6:02 PMaloof-laptop-71927
03/15/2023, 8:22 PMcareful-garden-46928
03/17/2023, 3:20 PMv0.10.0
In my case I was running an ingestion and the kafka broker disk went 100%.
After increasing the disk the GMS container keep failing and restarting with this error.aloof-gpu-11378
03/17/2023, 5:30 PMcareful-garden-46928
03/20/2023, 9:48 AMENABLE_PROMETHEUS: 'true',
DATAHUB_SERVER_TYPE: 'quickstart',
DATAHUB_TELEMETRY_ENABLED: 'true',
DATASET_ENABLE_SCSI: 'false',
EBEAN_DATASOURCE_HOST: `${rdsHost}`,
EBEAN_DATASOURCE_DRIVER: 'com.mysql.jdbc.Driver',
KAFKA_SCHEMAREGISTRY_URL: `http://${schemaRegistryHost}:8081`,
KAFKA_BOOTSTRAP_SERVER: `${kafkaBootstrapServer}`,
EBEAN_DATASOURCE_URL: `${ebeanDataSourceUrl}`,
ELASTICSEARCH_PORT: '443',
ELASTICSEARCH_USE_SSL: 'true',
GRAPH_SERVICE_IMPL: 'elasticsearch',
ENTITY_REGISTRY_CONFIG_PATH: '/datahub/datahub-gms/resources/entity-registry.yml',
MAE_CONSUMER_ENABLED: 'true',
MCE_CONSUMER_ENABLED: 'true',
PE_CONSUMER_ENABLED: 'true',
UI_INGESTION_ENABLED: 'true',
METADATA_SERVICE_AUTH_ENABLED: 'true',
careful-garden-46928
03/20/2023, 10:02 AMcareful-garden-46928
03/20/2023, 10:04 AMaloof-gpu-11378
03/20/2023, 7:12 PMdazzling-yak-93039
03/20/2023, 7:30 PMdocker system prune -a
dazzling-yak-93039
03/20/2023, 7:32 PMorange-night-91387
03/20/2023, 8:21 PMhead
something else is going on. If this is an environment with production data then please note that with the v0.10.0 release a reindex will occur and depending on the size of data can take several hours to resolve (this would be tens of millions of documents, large scale deployment).
Deleting all your Kafka topics would definitely cause a problem and you would need to re-run kafka-setup and the upgrade job. The upgrade -> GMS communication is done through a kafka message which lets GMS know the upgrade is finished.careful-garden-46928
03/21/2023, 10:07 AMdocker system prune
because we are running datahub in AWS ECS. I assume that when the containers restart the containers could / would start in a new hardware on aws side and with a clean volume and system.careful-garden-46928
03/21/2023, 10:10 AM0.10.0
I did delete the kafka topics to check if the tons of events in kafka would be causing some issue when the gms container was reinitialising. And afterwards I did execute the kafka-setup so all topics were back.
This procedure did not change the errors so we still got the same error io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
careful-garden-46928
03/21/2023, 10:13 AMcareful-garden-46928
03/21/2023, 10:14 AMorange-night-91387
03/21/2023, 4:22 PMI did delete the kafka topics to check if the tons of events in kafka would be causing some issue when the gms container was reinitialising. And afterwards I did execute the kafka-setup so all topics were back.Did you re-execute DataHub Upgrade though? Without doing this GMS would not start. Since you completely cleared the topic data, the message GMS would be looking for that got sent from the first Upgrade run would not be there. This is probably your issue since your upgrade was working prior and is a different root cause than what is probably happening to others in this thread.
careful-garden-46928
03/22/2023, 10:58 AMcareful-garden-46928
03/24/2023, 9:04 AMagreeable-belgium-70840
03/30/2023, 8:57 AMagreeable-belgium-70840
03/30/2023, 8:57 AMcareful-garden-46928
03/30/2023, 2:33 PMagreeable-belgium-70840
03/30/2023, 2:59 PMfierce-monkey-46092
04/04/2023, 12:49 PMhelpful-quill-60747
04/04/2023, 1:27 PMfierce-monkey-46092
04/04/2023, 1:28 PMagreeable-belgium-70840
04/04/2023, 1:29 PMwonderful-wall-76801
04/06/2023, 1:35 PMbland-lighter-26751
04/06/2023, 3:41 PMbland-lighter-26751
04/06/2023, 3:44 PMwonderful-wall-76801
04/07/2023, 6:32 AMfull-dentist-68591
04/14/2023, 10:31 AMdocker run acryldata/datahub-upgrade:v0.10.0 -u SystemUpdate
I could help myself using the command found in Restoring Search and Graph Indices from Local Database. I simply adapted the command as below. Be aware you might need to change the image version in datahub-upgrade.sh
and update the used docker.env
accordingly.
./docker/datahub-upgrade/datahub-upgrade.sh -u SystemUpdate
Hope this helpsfierce-monkey-46092
04/17/2023, 8:13 AM./docker/datahub-upgrade/datahub-upgrade.sh -u SystemUpdate
I followed the above command with changing my docker.env file. Script run successfully first time. After i log into front the GMS-version is not upgraded. When i run the script again it's giving me a error.
The error: Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPEDfull-dentist-68591
04/17/2023, 8:52 AMfierce-monkey-46092
04/17/2023, 8:56 AMcareful-garden-46928
04/17/2023, 11:34 AMv0.10.0
-> v0.10.2
I don’t understand way the GMS is failing with the connection refused error 😕
2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
careful-garden-46928
04/17/2023, 11:43 AMfierce-monkey-46092
04/18/2023, 8:07 AMastonishing-answer-96712
04/18/2023, 4:23 PMbrainy-tent-14503
04/18/2023, 8:19 PMsystem-update
job finishes so it will simply wait. The required output from the system-update
job is like
2023-04-17 10:42:24 Executing Step 4/5: DataHubStartupStep...
2023-04-17 10:42:24 2023-04-17 15:42:24.582 INFO 1 --- [ main] c.l.d.u.s.e.steps.DataHubStartupStep : Initiating startup for version: v0.10.2-0
2023-04-17 10:42:24 Completed Step 4/5: DataHubStartupStep successfully.
When using quickstart this is all handled for you and there is no need to execute the datahub_upgrade.sh
script. If you are managing docker manually note that the referenced script is not aligning the version necessarily, it points to head
and you are likely intending to deploy a specific version such as v0.10.2
careful-garden-46928
04/20/2023, 8:34 AMv0.10.0
to v0.10.2
(Tried to update to v0.10.1
before)
Error message while bringing up the v0.10.2
containers:
2023-03-15 17:33:10,411 [R2 Nio Event Loop-1-1] WARN c.l.r.t.h.c.c.ChannelPoolLifecycle:139 - Failed to create channel, remote=localhost/127.0.0.1:8080
io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: localhost/127.0.0.1:8080
Caused by: java.net.ConnectException: Connection refused
at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
...
After reading the code changes of release v0.10.2
I noticed some changes would seems to indicate it was necessary to run the command docker run --rm --env-file docker_upgrade.env acryldata/datahub-upgrade:v0.10.2 -u SystemUpdate
and confirmed by @brainy-tent-14503 message in this thread.
After executing successfully the SysUpdate command all errors disappeared and I was able to run v0.10.2
Conclusion:
It seems that it is always required to execute the SystemUpdate
command before updating to any new version.
It wasn’t clear to me and my team that the SystemUpdate command was required to be executed between minor version updates. There is no information regarding this in the release notes, neither in the official online documentation.
From version v0.9.X
to v0.10.X
there was a disclaimer informing that SystemUpdate command was required. I would expect also such note in this recent release.orange-night-91387
04/20/2023, 7:36 PMstrong-twilight-1984
04/21/2023, 6:43 AM-u SystemUpdate
command "succesfully", but i noticed in the logs:
INFO - 2023-04-20 13:20:22.761 ERROR 1 --- [main] c.l.m.dao.producer.KafkaHealthChecker : Failed to emit History Event for entity Event Version: v0.10.2-0
INFO -
INFO - org.apache.kafka.common.errors.TimeoutException: Topic DataHubUpgradeHistory_v1 not present in metadata after 60000 ms.
INFO -
INFO - 2023-04-20 13:20:22.762 INFO 1 --- [main] c.l.d.u.s.e.steps.DataHubStartupStep : Initiating startup for version: v0.10.2-0
INFO - Completed Step 4/5: DataHubStartupStep successfully.
Does this mean that the SystemUpdate message was never posted to the DataHubUpgradeHistory_v1
topic , and therefor never consumed by GMS?aloof-gpu-11378
04/22/2023, 12:26 AMkafka-setup
this docker container creates the topics including DataHubUpgradeHistory_v1
. The helm chart performs these and other setup jobs before the system update. As far as I know there is no way to apply the chart against ECS and there you will have run all the containers manually in the order specified by the helm hooks which also indicate when during an install/update to run and in what order. Examples: 1, 2. More information about this is available in the helm documentation.strong-twilight-1984
05/04/2023, 6:19 AMDataHubUpgradeHistory_v1
topic and gms was able to consume the upgrade messages from it.
no more localhost connection refused errors.strong-twilight-1984
05/04/2023, 6:20 AMSystemUpgrade
step passed if it could not post to the DataHubUpgradeHistory_v1
topic. surely it should have failed?early-kitchen-6639
05/17/2023, 6:49 AMbrainy-tent-14503
05/17/2023, 5:08 PMbrainy-tent-14503
06/05/2023, 11:56 PMhelpful-dream-67192
06/06/2023, 7:25 AMbrainy-tent-14503
06/06/2023, 8:17 PMbrainy-tent-14503
06/06/2023, 8:22 PMdh-system-update
whereas this log looks like dh-nocode-migration
logsbrainy-tent-14503
06/06/2023, 8:24 PMsystem-update
then something is being lost when executing the pod, likely the cli arguments here which will run the system-update logic.brainy-tent-14503
06/06/2023, 8:25 PMhelpful-dream-67192
06/07/2023, 7:52 AMaloof-gpu-11378
06/07/2023, 7:52 PMaloof-gpu-11378
06/07/2023, 7:53 PMcuddly-butcher-39945
06/08/2023, 1:10 PMfierce-orange-10929
07/24/2023, 7:07 PMbrainy-tent-14503
07/24/2023, 7:53 PMfierce-orange-10929
07/24/2023, 7:57 PM