Hey everyone We are stuck in quite a deadlock while trying t DataHub #all-things-deployment

Hey everyone We are stuck in quite a deadlock whi...

chilly-barista-6524

09/27/2021, 11:45 AM

Hey everyone We are stuck in quite a deadlock while trying to upgrade our datahub from v0.6 to v0.8.14. Here are the steps we have performed: 1. Took backup of our existing mysql db and launched a new mysql container and restored the dump into it 2. Using these helm charts: https://github.com/acryldata/datahub-helm to install the upgraded version. We install all the prerequisites (except Mysql because that we are using the one we launched in step 1) 3. All the prerequisites get installed properly. Then when we try to install datahub via helm chart, everything runs fine except

datahub-gms

and

datahubUpgrade

job.

datahub-gms

throws following error:

Copy code

javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect_v2' doesn't exist

and

datahubUpgrade

throws following error:

Copy code

ERROR: Cannot connect to GMSat host test-datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.

Now both the errors seems to be dependent on each other to me. I was wondering if we are missing any step in between and does the

metadata_aspect_v2

table needs to be created manually?

loud-island-88694

09/27/2021, 3:31 PM

cc @early-lamp-41924

early-lamp-41924

09/27/2021, 3:33 PM

@big-carpet-38439 seems like with recent changes gms requires the new table. This makes it impossible to run datahub-upgrade. Any ideas?

early-lamp-41924

09/27/2021, 3:45 PM

The other thing. mysqlSetupJob should’ve created the new table before gms spawned. Can you check the logs of that one to make sure the table was created correctly?

big-carpet-38439

09/27/2021, 4:16 PM

To go directly to 0.8.14 is going to be tough, given the recent bootstrap changes

chilly-barista-6524

09/27/2021, 4:21 PM

Hey, an update here. The main issue behind

metadata_aspect_v2

not being created was this: https://github.com/acryldata/datahub-helm/issues/35 I actually created the mysql clone on the same EC2 machine and gave it a different port but it was not able to pick the port due to the above mentioned issue

chilly-barista-6524

09/27/2021, 4:21 PM

Now the table has been created via the

mysqlsetupjob

included in the helm chart, but my gms pod is stuck since last 1 hour

chilly-barista-6524

09/27/2021, 4:24 PM

I have tried increasing liveness probe and readiness probe and also increased the resources to following values:

Copy code

datahub-gms:
  livenessProbe:
    initialDelaySeconds: 6000
    periodSeconds: 30
    failureThreshold: 8
  readinessProbe:
    initialDelaySeconds: 6000
    periodSeconds: 30
    failureThreshold: 8
  resources:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 1Gi

but the pod does not seem to budge, it is stuck in the following stage :

test-datahub-datahub-gms-84587bf8c5-tbkh4                        0/1     Running            0          8m37s

early-lamp-41924

09/27/2021, 4:40 PM

Can you print out the logs for this pod?

big-carpet-38439

09/27/2021, 4:57 PM

We can also check MySQL to see if any of the bootstrap metadata was ingested

chilly-barista-6524

09/27/2021, 6:12 PM

Hey, another update on this. the issue was with liveness and readiness probe being too high! So the initial GMS was taking time to get ready because of too less memory. So to give it enough time I increased readiness and liveness probe in order to deal with it which was not required. Reduced back liveness and readiness probe to normal values and the pod got ready. Thanks for your proactiveness in helping to debug this though 😄

early-lamp-41924

09/27/2021, 6:29 PM

Awesome! Though

early-lamp-41924

09/27/2021, 6:29 PM

your upgrade job would have been a no-op. We found an issue in the triggerring mechanism

early-lamp-41924

09/27/2021, 6:29 PM

Can you check the logs of the completed upgrade job?

early-lamp-41924

09/27/2021, 6:35 PM

if you see a message saying, it wasn’t qualified for an upgrade

early-lamp-41924

09/27/2021, 6:36 PM

Can you go to this yaml file in the helm chart (assuming you cloned the repo instead of using the published helm charts) https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-upgrade-job.yml#L61 and add

Copy code

- "-a"
- "force-upgrade"

👀 1

early-lamp-41924

09/27/2021, 6:36 PM

to the args list?

chilly-barista-6524

09/27/2021, 6:40 PM

Yeah, forgot to mention it here, the upgrade job did not run on the first time because the

mysqlsetupjob

created the

metadata_aspect_v2

and populated it with 2 rows, so the upgrade got skipped. We figured it out and disabled the

mysqlsetupjob

and truncated the table and performed the upgrade. Will explore the force-upgrade tomorrow as well. Logs and documentation are pretty neat. Thanks again 😄

early-lamp-41924

09/27/2021, 6:41 PM

sorry about the trouble. we are working on a fix, but the above solution would be the easiest for you guys at this point

chilly-barista-6524

09/27/2021, 6:41 PM

Right now our stage has been upgraded 🥳

chilly-barista-6524

09/27/2021, 6:41 PM

No worries and thanks for the help again 🙌

early-lamp-41924

09/27/2021, 6:42 PM

awesome!!!!

early-lamp-41924

09/27/2021, 6:42 PM

let us know if you run into any issue tomorrow as you test further!!

chilly-barista-6524

09/27/2021, 6:42 PM

sure

4 Views

Open in Slack

Previous Next