I have a problem when ingesting data from Glue. I ...
# ingestion
b
I have a problem when ingesting data from Glue. I get the following exception:
Copy code
'2022-09-21 12:33:03.932429 [exec_id=14acb269-e6af-4ca0-871b-684c02a11814] INFO: Caught exception EXECUTING '
           'task_id=14acb269-e6af-4ca0-871b-684c02a11814, name=RUN_INGEST, stacktrace=Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 525, in readline\n'
           '    line = await self.readuntil(sep)\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 620, in readuntil\n'
           '    raise exceptions.LimitOverrunError(\n'
           'asyncio.exceptions.LimitOverrunError: Separator is found, but chunk is longer than limit\n'
           '\n'
           'During handling of the above exception, another exception occurred:\n'
           '\n'
           'Traceback (most recent call last):\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 123, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 147, in execute\n'
           '    await tasks.gather(_read_output_lines(), _report_progress(), _process_waiter())\n'
           '  File "/usr/local/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 99, in _read_output_lines\n'
           '    line_bytes = await ingest_process.stdout.readline()\n'
           '  File "/usr/local/lib/python3.10/asyncio/streams.py", line 534, in readline\n'
           '    raise ValueError(e.args[0])\n'
           'ValueError: Separator is found, but chunk is longer than limit\n']}
Execution finished with errors.
And eventually the ingestion fails, even though it managed to ingest some of the data. My recipe looks like that : sink: type: datahub-rest config: server: ‘http://datahub-datahub-gms:8080’ source: type: glue config: aws_region: us-east-1 database_pattern: allow: - product_metrics I don’t see any other exceptions in the log. Does anyone know how to fix it?
plus1 1
c
I met the same problem with mysql source, I modified the recipe to set the option _*include_field_sample_values*_ of profiling as false, and then the ingestion task succeeded.
r
cc: @gray-shoe-75895
g
This bug should be fixed by upgrading to datahub actions 0.0.8
c
Thank @gray-shoe-75895 @ripe-dress-87297! The problem has been fixed after upgrading datahub action to v0.0.8.