Hey everyone We are stuck in quite a deadlock whi...
# all-things-deployment
c
Hey everyone We are stuck in quite a deadlock while trying to upgrade our datahub from v0.6 to v0.8.14. Here are the steps we have performed: 1. Took backup of our existing mysql db and launched a new mysql container and restored the dump into it 2. Using these helm charts: https://github.com/acryldata/datahub-helm to install the upgraded version. We install all the prerequisites (except Mysql because that we are using the one we launched in step 1) 3. All the prerequisites get installed properly. Then when we try to install datahub via helm chart, everything runs fine except
datahub-gms
and
datahubUpgrade
job.
datahub-gms
throws following error:
Copy code
javax.persistence.PersistenceException: Query threw SQLException:Table 'datahub.metadata_aspect_v2' doesn't exist
and
datahubUpgrade
throws following error:
Copy code
ERROR: Cannot connect to GMSat host test-datahub-datahub-gms port 8080. Make sure GMS is on the latest version and is running at that host before starting the migration.
Now both the errors seems to be dependent on each other to me. I was wondering if we are missing any step in between and does the
metadata_aspect_v2
table needs to be created manually?
l
cc @early-lamp-41924
e
@big-carpet-38439 seems like with recent changes gms requires the new table. This makes it impossible to run datahub-upgrade. Any ideas?
The other thing. mysqlSetupJob should’ve created the new table before gms spawned. Can you check the logs of that one to make sure the table was created correctly?
b
To go directly to 0.8.14 is going to be tough, given the recent bootstrap changes
c
Hey, an update here. The main issue behind
metadata_aspect_v2
not being created was this: https://github.com/acryldata/datahub-helm/issues/35 I actually created the mysql clone on the same EC2 machine and gave it a different port but it was not able to pick the port due to the above mentioned issue
Now the table has been created via the
mysqlsetupjob
included in the helm chart, but my gms pod is stuck since last 1 hour
I have tried increasing liveness probe and readiness probe and also increased the resources to following values:
Copy code
datahub-gms:
  livenessProbe:
    initialDelaySeconds: 6000
    periodSeconds: 30
    failureThreshold: 8
  readinessProbe:
    initialDelaySeconds: 6000
    periodSeconds: 30
    failureThreshold: 8
  resources:
    limits:
      cpu: 1
      memory: 2Gi
    requests:
      cpu: 500m
      memory: 1Gi
but the pod does not seem to budge, it is stuck in the following stage :
test-datahub-datahub-gms-84587bf8c5-tbkh4                        0/1     Running            0          8m37s
e
Can you print out the logs for this pod?
b
We can also check MySQL to see if any of the bootstrap metadata was ingested
c
Hey, another update on this. the issue was with liveness and readiness probe being too high! So the initial GMS was taking time to get ready because of too less memory. So to give it enough time I increased readiness and liveness probe in order to deal with it which was not required. Reduced back liveness and readiness probe to normal values and the pod got ready. Thanks for your proactiveness in helping to debug this though 😄
e
Awesome! Though
your upgrade job would have been a no-op. We found an issue in the triggerring mechanism
Can you check the logs of the completed upgrade job?
if you see a message saying, it wasn’t qualified for an upgrade
Can you go to this yaml file in the helm chart (assuming you cloned the repo instead of using the published helm charts) https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/templates/datahub-upgrade/datahub-upgrade-job.yml#L61 and add
Copy code
- "-a"
- "force-upgrade"
👀 1
to the args list?
c
Yeah, forgot to mention it here, the upgrade job did not run on the first time because the
mysqlsetupjob
created the
metadata_aspect_v2
and populated it with 2 rows, so the upgrade got skipped. We figured it out and disabled the
mysqlsetupjob
and truncated the table and performed the upgrade. Will explore the force-upgrade tomorrow as well. Logs and documentation are pretty neat. Thanks again 😄
e
sorry about the trouble. we are working on a fix, but the above solution would be the easiest for you guys at this point
c
Right now our stage has been upgraded 🥳
No worries and thanks for the help again 🙌
e
awesome!!!!
let us know if you run into any issue tomorrow as you test further!!
c
sure