I m noticing that the upgrade job to migrate data for the 0 DataHub #ingestion

I’m noticing that the upgrade job to migrate data ...

white-beach-27328

06/07/2021, 8:07 PM

I’m noticing that the upgrade job to migrate data for the 0.7.1 to 0.8.0 changes is going realllly slowly. Is that expected or should I be increasing the pod’s resources?

loud-island-88694

06/07/2021, 8:08 PM

@big-carpet-38439 ^

white-beach-27328

06/07/2021, 8:08 PM

As of right now we have

rows in legacy aspects table, and it hasn’t finished the first 1000 in 10 minutes

white-beach-27328

06/07/2021, 8:10 PM

The resources the pod is asking for is

Copy code

resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 300m
              memory: 256Mi

big-carpet-38439

06/07/2021, 8:16 PM

it hasn't finished the first 1000?

big-carpet-38439

06/07/2021, 8:16 PM

good lord

white-beach-27328

06/07/2021, 8:16 PM

yeah

white-beach-27328

06/07/2021, 8:16 PM

something is clearly wrong

big-carpet-38439

06/07/2021, 8:16 PM

yes

big-carpet-38439

06/07/2021, 8:16 PM

whats the output say.. can you copy it here?

white-beach-27328

06/07/2021, 8:17 PM

Copy code

ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.4.RELEASE)

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See <http://www.slf4j.org/codes.html#no_static_mdc_binder> for further details.
Jun 07, 2021 7:23:06 PM org.neo4j.driver.internal.logging.JULogger info
INFO: Direct driver instance 727860268 created for server address datahub-neo4j-neo4j:7687
Starting upgrade with id NoCodeDataMigration...
Executing Step 1/8: RemoveAspectV2TableStep...
Completed Step 1/8: RemoveAspectV2TableStep successfully.
Executing Step 2/8: GMSQualificationStep...
Completed Step 2/8: GMSQualificationStep successfully.
Executing Step 3/8: MAEQualificationStep...
MAE Consumer is running and up to date. Proceeding with upgrade...
Completed Step 3/8: MAEQualificationStep successfully.
Executing Step 4/8: UpgradeQualificationStep...
-- V1 table exists
-- V1 table has 281433 rows
-- V2 table does not exist
Found qualified upgrade candidate. Proceeding with upgrade...
Completed Step 4/8: UpgradeQualificationStep successfully.
Executing Step 5/8: CreateAspectTableStep...
Completed Step 5/8: CreateAspectTableStep successfully.
Executing Step 6/8: IngestDataPlatformsStep...
Preparing to ingest DataPlatforms...
Found 23 DataPlatforms
Successfully ingested 23 DataPlatforms.
Completed Step 6/8: IngestDataPlatformsStep successfully.
Executing Step 7/8: DataMigrationStep...
Starting data migration...
Found 281433 rows in legacy aspects table
Reading rows 0 through 1000 from legacy aspects table.

white-beach-27328

06/07/2021, 8:17 PM

I figured something was off when the 23 data platforms

white-beach-27328

06/07/2021, 8:18 PM

took several minutes

big-carpet-38439

06/07/2021, 8:19 PM

23 data platforms took several minutes. that's very strange

white-beach-27328

06/07/2021, 8:20 PM

the memory usage is coasting at about 200MB but the CPU utilization is very low

white-beach-27328

06/07/2021, 8:38 PM

here’s the mysql url:

"jdbc:mysql://${HOST_NAME_GOES_HERE}:3306/datahub?verifyServerCertificate=false&useSSL=true"

white-beach-27328

06/07/2021, 8:38 PM

with the host name redacted

white-beach-27328

06/07/2021, 8:40 PM

in case the ssl configuration is some kind of problem

white-beach-27328

06/07/2021, 8:41 PM

I tried kicking off the job with significantly more resources and nothing much is changing

white-beach-27328

06/07/2021, 8:57 PM

well I’m going to table trying to do this upgrade then

big-carpet-38439

06/07/2021, 8:59 PM

i will get back in just a bit

big-carpet-38439

06/07/2021, 9:51 PM

Here's the connection string we've been using:

Copy code

jdbc:<mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>

big-carpet-38439

06/07/2021, 9:53 PM

Can you confirm that the first 1000 never completed? It'd be useful to see the state of MySQL after some time has passed (specifically rows inside metadata_aspect_v2)

big-carpet-38439

06/07/2021, 11:33 PM

Zachary- are you able to query MySQL while the ingest script is running? Hoping to get to the bottom of this one

big-carpet-38439

06/07/2021, 11:33 PM

Alternatively, we can schedule some time to sync up over video?

white-beach-27328

06/11/2021, 12:35 AM

For posterity going to record the outcome of our conversations. The root cause of this issue was misconfigured SSL certs for the upgrade job K8's pod which resulted in the upgrade job to fail to write the necessary MAE events to Kafka. The retries in this failure then caused the inserts into the MySQL database to take much longer than anticipated but didn’t present direct logging of the issue from the upgrade job. Investigating the MAE topic showed that the records for the upgrade job weren’t coming through. Shout out to @big-carpet-38439 and @early-lamp-41924 for working through this with me.

Open in Slack

Previous Next