Anyone upgraded to 0 8 28 and seeing the `mae consumer` and DataHub #troubleshoot

Anyone upgraded to 0.8.28 and seeing the `mae-cons...

gentle-night-56466

03/09/2022, 12:30 AM

Anyone upgraded to 0.8.28 and seeing the

mae-consumer

and

mce-consumer

failing? It looks like the springboot app starts fine, but /actuator/health returns 404. Reverting to 0.8.27 and they work fine.

plus1 1

green-football-43791

03/09/2022, 12:42 AM

were there any logs?

gentle-night-56466

03/09/2022, 1:37 AM

I tried comparing the working vs non-working logs and it seems to be the same. The pod’s health check fails although from the logs it seems like everything is fine. There is even a message about 3 endpoints being created on the /actuator path metrics, info, health.

Copy code

curl localhost:9090/actuator/health
{"timestamp":"2022-03-09T01:35:10.978+00:00","status":404,"error":"Not Found","message":"Not Found","path":"/actuator/health"}

mce.log

green-football-43791

03/09/2022, 1:40 AM

out of curiosity, are the pods functional?

green-football-43791

03/09/2022, 1:41 AM

e.g., if you run ingestion will the entities be indexed by the mae consumer (even though its failing the health check?)

green-football-43791

03/09/2022, 1:41 AM

or is it non-functional?

green-football-43791

03/09/2022, 1:41 AM

also, are you running your gms container with

MAE_CONSUMER_ENABLED=false

? If not gms container will run mae consumer inside of it

green-football-43791

03/09/2022, 1:42 AM

that may be the problem 🤔

gentle-night-56466

03/09/2022, 1:46 AM

The helm chart is configured with global.datahub_standalone_consumers_enabled = true and is not setting MCE/MAE_CONSUMER_ENABLED=true

gentle-night-56466

03/09/2022, 1:47 AM

I will test whether they are working tomorrow, they don’t run very long though because they get kill after a few minutes because of the health check

gentle-night-56466

03/09/2022, 1:48 AM

They run for like 4 minutes or so but never reach ready

Copy code

datahub-datahub-mce-consumer-7cc5475595-c9cwk      0/1     Running   5 (3m21s ago)    20m

early-lamp-41924

03/09/2022, 1:49 AM

yeah the global.datahub_standalone_consumers_enabled = true auto sets the above env variables

early-lamp-41924

03/09/2022, 1:51 AM

This is very strange that both are failing and no real error msgs are coming out. We will also try some testing ourselves

gentle-night-56466

03/09/2022, 1:53 AM

Its probably something with my environment, however this is the first time I’ve run into this after running releases since 0.8.20

shy-parrot-64120

03/09/2022, 1:21 PM

faced same issue with `mae`/`mce` while migrating 0.8.27 -> 0.8.28 any resoluiton for it?

early-lamp-41924

03/09/2022, 7:02 PM

We have reproduced as well. For some reason seems like the health endpoitns are not working. We will update you after further investigation

gentle-night-56466

03/09/2022, 7:23 PM

My theory is that its related to the spring libraries perhaps. Spring/spring-boot libraries are like a house of cards. There was a change made here where a later version of spring-boot was introduced. It might have broken something with maybe the spring boot autoconfigure of the endpoints. Not sure though.

early-lamp-41924

03/09/2022, 8:48 PM

We found the issue. While adding the openAPI servlet to GMS, we set the following, which got picked up by the consumer jobs as well. This made the actuator expose /openapi/actuator/health instead of /actuator/health 😞

Copy code

spring:
  mvc:
    servlet:
      path: /openapi

early-lamp-41924

03/09/2022, 8:48 PM

we are sending out a fix now. will create a new release afterwards

early-lamp-41924

03/09/2022, 8:49 PM

sorry about the issue and thanks for reporting it!!

early-lamp-41924

03/09/2022, 8:49 PM

cc @orange-night-91387 who is working on getting the fix out!

✅ 1

thank you 2

orange-night-91387

03/09/2022, 9:11 PM

https://github.com/linkedin/datahub/pull/4366 This should fix it

green-football-43791

03/10/2022, 9:28 PM

Hey @gentle-night-56466 , @shy-parrot-64120 - the fix has just been released in Datahub 0.8.29!

🚀 1

green-football-43791

03/10/2022, 9:28 PM

David - your Java 11 changes are in that release as well.

thank you 1

gentle-night-56466

03/10/2022, 10:02 PM

@green-football-43791 - I believe introducing the java 11 change is resulting in a timeout when the gradle 6 toolchain is taking time to bootstap the java 11 jdk. This doesn’t seem to happen locally when the jdk11 compiler is already present. The failures look like this

Copy code

> Task :metadata-integration:java:datahub-client:test

datahub.client.rest.RestEmitterTest > testTimeoutOnGet FAILED
    org.mockserver.client.SocketCommunicationException at RestEmitterTest.java:318

green-football-43791

03/10/2022, 10:03 PM

I see----

green-football-43791

03/10/2022, 10:04 PM

would bumping this timeout resolve the issue?

green-football-43791

03/10/2022, 10:04 PM

https://github.com/linkedin/datahub/blob/master/metadata-integration/java/datahub-client/src/test/java/datahub/client/rest/RestEmitterTest.java#L326

green-football-43791

03/10/2022, 10:04 PM

if so, do you have a recommendation for the timeout duration?

gentle-night-56466

03/10/2022, 10:04 PM

or in any case, something happened around the jdk11 change to increase it

green-football-43791

03/10/2022, 10:04 PM

im surprised this wouldn’t have come up in CI 😕

gentle-night-56466

03/10/2022, 10:05 PM

It did run several times at least, but I see two cases of this today

gentle-night-56466

03/10/2022, 10:05 PM

My other PR and another one that ran today

gentle-night-56466

03/10/2022, 10:06 PM

The jdk 11 changes did not directly touch that module, so maybe I’m just being paranoid. That said, it seems to be happening today.

green-football-43791

03/10/2022, 10:07 PM

ok — cc @careful-pilot-86309 - I believe you contributed this test

green-football-43791

03/10/2022, 10:07 PM

Mugdha would you be able to take a look here?

gentle-night-56466

03/10/2022, 10:12 PM

Both these PRs ran today and failed with the same condition: PR1, PR2

green-football-43791

03/10/2022, 10:12 PM

thanks for the heads up David

no problem 1

shy-parrot-64120

03/12/2022, 7:07 PM

verifying on our env

shy-parrot-64120

03/12/2022, 7:42 PM

works as a charm - thanks a lot folks

careful-pilot-86309

03/14/2022, 10:27 AM

I checked and i think mock server is struggling to get port to start on. I have raised PR with the quick fix. @mammoth-bear-12532 Please review the PR. This is same thing what we did in spark-lineage test

2 Views

Open in Slack

Previous Next