Hello, we have datahub deployed on EKS using the h...
# all-things-deployment
b
Hello, we have datahub deployed on EKS using the helm chart. Recently we have found that we are no longer able to deploy the prerequisites of datahub because bitnami had removed old versions of Kafka and MySql. With research we found that they have to completely changed their retention policy and deleted everything larger than 6 months old. https://github.com/bitnami/charts/issues/10539 We upgrade kafka and mysql to the newest versions but now we are getting errors when ingesting MySQL and SNOWFLAKE assets. The MySQL errors are simple syntactic errors that are likely occurring due to different versioning. Does any one have a solution deploying the prerequisites with the correct versions now that bitnami has deleted them or do we need to wait for the ingestion jobs to be rewritten to the latest versions of mysql/kafka?
e
Thanks for pointing them out! Do you mind posting the error msgs that you are getting with the newer version?
b
hi @early-lamp-41924 For MySQL we get this repeated syntax error.
Copy code
RUN_INGEST - {'errors': [],
 'exec_id': 'b806ae8f-e28b-411d-8132-fb1d0e4a2051',
 'infos': ['2022-06-06 09:52:43.760385 [exec_id=b806ae8f-e28b-411d-8132-fb1d0e4a2051] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-06-06 10:04:31.555017 [exec_id=b806ae8f-e28b-411d-8132-fb1d0e4a2051] INFO: stdout=  File '
           '"/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1316, in '
           '_execute_context\n'
           '    self._handle_dbapi_exception(\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1510, '
           'in _handle_dbapi_exception\n'
           '    util.raise_(\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 182, '
           'in raise_\n'
           '    raise exception\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, '
           'in _execute_context\n'
           '    self.dialect.do_execute(\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line '
           '608, in do_execute\n'
           '    cursor.execute(statement, parameters)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in '
           'execute\n'
           '    result = self._query(query)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in '
           '_query\n'
           '    conn.query(q)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 548, in '
           'query\n'
           '    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 775, in '
           '_read_query_result\n'
           '    result.read()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 1156, in '
           'read\n'
           '    first_packet = self.connection._read_packet()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 725, in '
           '_read_packet\n'
           '    packet.raise_for_error()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in '
           'raise_for_error\n'
           '    err.raise_mysql_exception(self._data)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/err.py", line 143, in '
           'raise_mysql_exception\n'
           '    raise errorclass(errno, errval)\n'
           'sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that '
           "corresponds to your MySQL server version for the right syntax to use near 'gujopyvginjcfdfh AS \\n(SELECT count(*) AS element_count, "
           'sum(CASE WHEN (group_id \' at line 1")\n'
           '[SQL: WITH gujopyvginjcfdfh AS \n'
           '(SELECT count(*) AS element_count, sum(CASE WHEN (group_id IN (NULL) OR group_id IS NULL) THEN %(param_1)s ELSE %(param_2)s END) AS '
           'null_count \n'
           'FROM socialgoo.auth_group_permission_group), \n'
           'dbxisygefpjpbdju AS \n'
           '(SELECT count(*) AS element_count, sum(CASE WHEN (permission_group_id IN (NULL) OR permission_group_id IS NULL) THEN %(param_3)s ELSE '
           '%(param_4)s END) AS null_count \n'
           'FROM socialgoo.auth_group_permission_group)\n'
           ' SELECT gujopyvginjcfdfh.element_count, gujopyvginjcfdfh.null_count, dbxisygefpjpbdju.element_count, dbxisygefpjpbdju.null_count \n'
           'FROM gujopyvginjcfdfh, dbxisygefpjpbdju]\n'
           "[parameters: {'param_1': 1, 'param_2': 0, 'param_3': 1, 'param_4': 0}]\n"
           '(Background on this error at: <http://sqlalche.me/e/13/f405>)\n'
           '[2022-06-06 10:04:27,745] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit '
           'socialgoo.statistics_page_engagement_weekly-subtypes\n'
           '[2022-06-06 10:04:27,784] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit '
           'container-urn:li:container:0d4a1159906f2b10023c0cd579779d5b-to-urn:li:dataset:(urn:li:dataPlatform:mysql,socialgoo.statistics_page_insights,STG)\n'
           '[2022-06-06 10:04:27,797] INFO     {datahub.ingestion.source.ge_data_profiler:835} - Finished profiling socialgoo.auth_group; took 1.073 '
           'seconds\n'
           '[2022-06-06 10:04:27,815] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit socialgoo.statistics_page_insights\n'
           '[2022-06-06 10:04:27,860] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit '
           'socialgoo.statistics_page_insights-subtypes\n'
           '[2022-06-06 10:04:28,017] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit '
           'container-urn:li:container:0d4a1159906f2b10023c0cd579779d5b-to-urn:li:dataset:(urn:li:dataPlatform:mysql,socialgoo.statistics_page_locale,STG)\n'
           '[2022-06-06 10:04:28,032] INFO     {datahub.ingestion.source.ge_data_profiler:817} - Profiling socialgoo.auth_permission\n'
           '[2022-06-06 10:04:28,046] INFO     {datahub.ingestion.run.pipeline:84} - sink wrote workunit socialgoo.statistics_page_locale\n'
           '[2022-06-06 10:04:28,060] ERROR    {datahub.utilities.sqlalchemy_query_combiner:388} - Failed to execute queue using combiner: '
           '(pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server '
           "version for the right syntax to use near 'qdrqnnyilydcuqvw AS \\n(SELECT count(*) AS count_1 \\nFROM socialgoo.auth_permission' at line "
           '1")\n'
           '[SQL: WITH qdrqnnyilydcuqvw AS \n'
           '(SELECT count(*) AS count_1 \n'
           'FROM socialgoo.auth_permission)\n'
           ' SELECT qdrqnnyilydcuqvw.count_1 \n'
           'FROM qdrqnnyilydcuqvw]\n'
           '(Background on this error at: <http://sqlalche.me/e/13/f405>)\n'
           'Traceback (most recent call last):\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1276, '
           'in _execute_context\n'
           '    self.dialect.do_execute(\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line '
           '608, in do_execute\n'
           '    cursor.execute(statement, parameters)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/cursors.py", line 148, in '
           'execute\n'
           '    result = self._query(query)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/cursors.py", line 310, in '
           '_query\n'
           '    conn.query(q)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 548, in '
           'query\n'
           '    self._affected_rows = self._read_query_result(unbuffered=unbuffered)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 775, in '
           '_read_query_result\n'
           '    result.read()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 1156, in '
           'read\n'
           '    first_packet = self.connection._read_packet()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/connections.py", line 725, in '
           '_read_packet\n'
           '    packet.raise_for_error()\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/protocol.py", line 221, in '
           'raise_for_error\n'
           '    err.raise_mysql_exception(self._data)\n'
           '  File "/tmp/datahub/ingest/venv-b806ae8f-e28b-411d-8132-fb1d0e4a2051/lib/python3.9/site-packages/pymysql/err.py", line 143, in '
           'raise_mysql_exception\n'
           '    raise errorclass(errno, errval)\n'
           'pymysql.err.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server '
           "version for the right syntax to use near 'qdrqnnyilydcuqvw AS \\n(SELECT count(*) AS count_1 \\nFROM socialgoo.auth_permission' at line "
           '1")\n'
For snowflake ingestion we are getting sql compilation errors
And for snowflake usage ingestion we are getting none is not allowed
Copy code
~~~~ Execution Summary ~~~~

RUN_INGEST - {'errors': [],
 'exec_id': '47ca049d-76fa-4cc9-82d8-76c28e22797c',
 'infos': ['2022-06-06 13:00:00.147193 [exec_id=47ca049d-76fa-4cc9-82d8-76c28e22797c] INFO: Starting execution for task with name=RUN_INGEST',
           '2022-06-06 13:04:32.243512 [exec_id=47ca049d-76fa-4cc9-82d8-76c28e22797c] INFO: stdout=  none is not an allowed value '
           '(type=type_error.none.not_allowed)\n'
           '[2022-06-06 13:04:26,617] WARNING  {datahub.ingestion.source.usage.snowflake_usage:93} - usage => Failed to parse usage line '
           "{'query_start_time': datetime.datetime(2022, 6, 5, 5, 47, 11, 717000, tzinfo=datetime.timezone.utc), 'query_text': 'COPY INTO "
           'GOOGLE_AD.google_ads_settings_stage_temp (JSON_DATA, DATA, ACTION, CONTEXT, CREATED_AT, SERVICE, '
           'SERVICE_VERSION,source_file,source_file_row_number,load_at)         FROM (             SELECT $1,            $1:data,            '
           '$1:metadata:action,            $1:metadata:context,            $1:metadata:createdAt,            $1:metadata:service,            '
           '$1:metadata:serviceVersion,            metadata$filename ,            metadata$file_row_number,            '
           'current_timestamp()             FROM @google_ads_settings_stage         )         FILE_FORMAT = (TYPE=JSON)         force = '
           "false         ON_ERROR = CONTINUE', 'query_type': 'COPY', 'rows_inserted': 0, 'rows_updated': 0, 'rows_deleted': 0, "
           "'base_objects_accessed': [], 'direct_objects_accessed': [], 'user_name': 'METIGY_STAGING', 'first_name': None, 'last_name': None, "
           "'display_name': 'METIGY_STAGING', 'email': None, 'role_name': 'STAGING_ACCESS'}, 1 validation error for SnowflakeJoinedAccessEvent\n"
           'email\n'
           '  none is not an allowed value (type=type_error.none.not_allowed)\n'
           '[2022-06-06 13:04:26,618] WARNING  {datahub.ingestion.source.usage.snowflake_usage:93} - usage => Failed to parse usage line '
           '{\'query_start_time\': datetime.datetime(2022, 6, 5, 5, 33, 25, 89000, tzinfo=datetime.timezone.utc), \'query_text\': "UPDATE '
           'google_ad.GOOGLE_ADS_METRICS_STAGE_TEMP             SET PROCESSED_AT = current_timestamp(2)             WHERE ACTION IN '
           '(\'google_ad.GOOGLE_ADS_METRICS_STAGE_TEMP\', \'biddingStrategy_metrics\')                 AND PROCESSED_AT IS NULL", \'query_type\': '
           "'UPDATE', 'rows_inserted': 0, 'rows_updated': 0, 'rows_deleted': 0, 'base_objects_accessed': [{'columns': [{'columnId': 318487, "
           "'columnName': 'PROCESSED_AT'}, {'columnId': 318489, 'columnName': 'ACTION'}], 'objectDomain': 'Table', 'objectId': 294916, 'objectName': "
           "'REPORTING_PROD.GOOGLE_AD.GOOGLE_ADS_METRICS_STAGE_TEMP'}], 'direct_objects_accessed': [{'columns': [{'columnId': 318487, 'columnName': "
           "'PROCESSED_AT'}, {'columnId': 318489, 'columnName': 'ACTION'}], 'objectDomain': 'Table', 'objectId': 294916, 'objectName': "
           "'REPORTING_PROD.GOOGLE_AD.GOOGLE_ADS_METRICS_STAGE_TEMP'}], 'user_name': 'METIGY_PROD', 'first_name': None, 'last_name': None, "
           "'display_name': 'METIGY_PROD', 'email': None, 'role_name': 'PROD_ACCESS'}, 1 validation error for SnowflakeJoinedAccessEvent\n"
           'email\n'
           '  none is not an allowed value (type=type_error.none.not_allowed)\n'
im not an expert in helm, but is there a way to get it running again with the old versions of kafka and mysql?
e
This is very wierd
So you updated kafka and mysql
the python client for ingestion is failing before it even gets to the backend?
b
yes we updated kafka and mysql to the latest versions in bitnami
there's syntax errors for the mysql ingestion, that's a versioning issue isn't it?
e
Are you ingesting mysql sources in the mysql db on kubernetes you just created? or is this an existing mysql db with your data?
b
existing mysql db with data
e
But that didn’t change right?
b
no it didnt
e
I would say the error above is due to the upgrade if it was erroring out on the datahub side
If it is erroring out on the ingestion side, most likely bc the ingestion code is not working with your source for some reason, so we should debug this issue separately!
@square-activity-64562 if you have bandwidth, could you take a look at these errors?
b
we haven't updated our datahub version, just the prereq mysql version and kafka
e
is everything else working fine? like the UI or read API
b
yes
e
prereqs are the persistence layer behind datahub, so ingestion client never talks directly to these prereq systems
if there is something wrong bc of the prereq version, you would see failure logs in gms
b
hmm
e
I am suspecting this doesn’t have anything to do with prereqs, but some issue with the datahub client
b
i see
e
I would try upgrading both server and cli to latest
and see if you still get the errors!
b
we were on 8.36, we went back to 8.33 and it fixed our snowflake ingestions, it is just the mysql one now that is broken
e
which version is your datahub cli?
Copy code
datahub version
or is this on UI ingestion?
b
we configure our ingestions in the UI
e
Got it
So
on your values.yaml
b
it's 0.8.33
e
ahhhh
that must be it
can you not set this?
then it should use the defaults we set in the main values.yaml, which we update on release
basically we want this to be the same as the version you are trying to deploy
(which we make sure in the values.yaml in the repo, but we can’t change all the custom ones)
b
yeah it is the same
the external database we are connecting to is MySQL v5.7 which doesn't allow WITH statements
somehow we got this ingestion job to work before, but now it is using WITH statements and breaking it. Any idea what we can do about this?
we worked out the issue. Basically we had it working and then we turned profiling on. It looks like profiling uses WITH statements and since our external database is v5.7 it broke. Are there any plans to support profiling of MySQL v5.7 assets?
or assets older than v8 (the version where WITH statements were introducced)?
e
@square-activity-64562 did we change how mysql works?
So to clarify
your server version is on v0.8.36 and the defaultCliVersion is either not set or set to v0.8.36?
b
we're running 0.8.33
snowflake and mysql jobs broke when we moved to 0.8.36
we moved back and snowflake was fixed but mysql was still broken
at the same time we upgraded to 0.8.36 we turned profiling on, so we turned profiling off and now its working again
s
I don't think there have been any changes in mysql. The errors are related to profiling not working with mysql 5.7. I don't think there are integration tests for that version
So you can work with 0.8.36 but won't be able to do profiling currently for mysql 5.7
can you please open a github issue for supporting profiling for mysql 5.7?
e
Is this documented anywhere?