I’m noticing that the upgrade job to migrate data ...
# ingestion
w
I’m noticing that the upgrade job to migrate data for the 0.7.1 to 0.8.0 changes is going realllly slowly. Is that expected or should I be increasing the pod’s resources?
l
@big-carpet-38439 ^
w
As of right now we have
281433
rows in legacy aspects table, and it hasn’t finished the first 1000 in 10 minutes
The resources the pod is asking for is
Copy code
resources:
            limits:
              cpu: 500m
              memory: 512Mi
            requests:
              cpu: 300m
              memory: 256Mi
b
it hasn't finished the first 1000?
good lord
w
yeah
something is clearly wrong
b
yes
whats the output say.. can you copy it here?
w
Copy code
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v2.1.4.RELEASE)

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See <http://www.slf4j.org/codes.html#StaticLoggerBinder> for further details.
SLF4J: Failed to load class "org.slf4j.impl.StaticMDCBinder".
SLF4J: Defaulting to no-operation MDCAdapter implementation.
SLF4J: See <http://www.slf4j.org/codes.html#no_static_mdc_binder> for further details.
Jun 07, 2021 7:23:06 PM org.neo4j.driver.internal.logging.JULogger info
INFO: Direct driver instance 727860268 created for server address datahub-neo4j-neo4j:7687
Starting upgrade with id NoCodeDataMigration...
Executing Step 1/8: RemoveAspectV2TableStep...
Completed Step 1/8: RemoveAspectV2TableStep successfully.
Executing Step 2/8: GMSQualificationStep...
Completed Step 2/8: GMSQualificationStep successfully.
Executing Step 3/8: MAEQualificationStep...
MAE Consumer is running and up to date. Proceeding with upgrade...
Completed Step 3/8: MAEQualificationStep successfully.
Executing Step 4/8: UpgradeQualificationStep...
-- V1 table exists
-- V1 table has 281433 rows
-- V2 table does not exist
Found qualified upgrade candidate. Proceeding with upgrade...
Completed Step 4/8: UpgradeQualificationStep successfully.
Executing Step 5/8: CreateAspectTableStep...
Completed Step 5/8: CreateAspectTableStep successfully.
Executing Step 6/8: IngestDataPlatformsStep...
Preparing to ingest DataPlatforms...
Found 23 DataPlatforms
Successfully ingested 23 DataPlatforms.
Completed Step 6/8: IngestDataPlatformsStep successfully.
Executing Step 7/8: DataMigrationStep...
Starting data migration...
Found 281433 rows in legacy aspects table
Reading rows 0 through 1000 from legacy aspects table.
I figured something was off when the 23 data platforms
took several minutes
b
23 data platforms took several minutes. that's very strange
w
the memory usage is coasting at about 200MB but the CPU utilization is very low
here’s the mysql url:
"jdbc:mysql://${HOST_NAME_GOES_HERE}:3306/datahub?verifyServerCertificate=false&useSSL=true"
with the host name redacted
in case the ssl configuration is some kind of problem
I tried kicking off the job with significantly more resources and nothing much is changing
well I’m going to table trying to do this upgrade then
b
i will get back in just a bit
Here's the connection string we've been using:
Copy code
jdbc:<mysql://mysql:3306/datahub?verifyServerCertificate=false&useSSL=true&useUnicode=yes&characterEncoding=UTF-8>
Can you confirm that the first 1000 never completed? It'd be useful to see the state of MySQL after some time has passed (specifically rows inside metadata_aspect_v2)
Zachary- are you able to query MySQL while the ingest script is running? Hoping to get to the bottom of this one
Alternatively, we can schedule some time to sync up over video?
w
For posterity going to record the outcome of our conversations. The root cause of this issue was misconfigured SSL certs for the upgrade job K8's pod which resulted in the upgrade job to fail to write the necessary MAE events to Kafka. The retries in this failure then caused the inserts into the MySQL database to take much longer than anticipated but didn’t present direct logging of the issue from the upgrade job. Investigating the MAE topic showed that the records for the upgrade job weren’t coming through. Shout out to @big-carpet-38439 and @early-lamp-41924 for working through this with me.