hundreds-airline-29192
05/26/2023, 7:55 AMbotocore.exceptions.PaginationError: Error during pagination: The same next token was received twice: {'Marker': 'dwh/dev/fact/fact_gross_profit/order_date_key_07%3D20230109/part-00018-f1470254-2c8b-4a23-aaad-0260cdca7054.c000.snappy.parquet'}
gentle-hamburger-31302
05/30/2023, 7:56 AMhundreds-airline-29192
05/30/2023, 7:56 AM[2023-05-27 09:05:28,029] ERROR {datahub.entrypoints:195} - Command failed: Error during pagination: The same next token was received twice: {'Marker': 'dwh/dev/fact/fact_gross_profit/order_date_key_07%3D20230109/part-00018-f1470254-2c8b-4a23-aaad-0260cdca7054.c000.snappy.parquet'}
Traceback (most recent call last):
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/entrypoints.py", line 182, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 379, in wrapper
raise e
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 334, in wrapper
res = func(*args, **kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
return func(ctx, *args, **kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 198, in run
loop.run_until_complete(run_func_check_upgrade(pipeline))
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 158, in run_func_check_upgrade
ret = await the_one_future
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_async
return await loop.run_in_executor(
File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 140, in run_pipeline_to_completion
raise e
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 132, in run_pipeline_to_completion
pipeline.run()
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 359, in run
for wu in itertools.islice(
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/ingestion/source/gcs/gcs_source.py", line 156, in get_workunits
yield from auto_workunit_reporter(
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/utilities/source_helpers.py", line 115, in auto_workunit_reporter
for wu in stream:
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/utilities/source_helpers.py", line 42, in auto_status_aspect
for wu in stream:
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/ingestion/source/s3/source.py", line 765, in get_workunits
for file, timestamp, size in file_browser:
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/ingestion/source/s3/source.py", line 726, in s3_browser
for obj in bucket.objects.filter(Prefix=prefix).page_size(PAGE_SIZE):
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/boto3/resources/collection.py", line 81, in __iter__
for page in self.pages():
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/boto3/resources/collection.py", line 171, in pages
for page in pages:
File "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/botocore/paginate.py", line 327, in __iter__
raise PaginationError(message=message)
botocore.exceptions.PaginationError: Error during pagination: The same next token was received twice: {'Marker': 'dwh/dev/fact/fact_gross_profit/order_date_key_07%3D20230109/part-00018-f1470254-2c8b-4a23-aaad-0260cdca7054.c000.snappy.parquet'}
hundreds-airline-29192
05/30/2023, 7:57 AMgentle-hamburger-31302
05/30/2023, 7:59 AMhundreds-airline-29192
05/30/2023, 8:00 AMgentle-hamburger-31302
05/30/2023, 8:03 AMhundreds-airline-29192
05/30/2023, 8:04 AMgentle-hamburger-31302
05/30/2023, 8:11 AMhundreds-airline-29192
05/30/2023, 8:16 AMaloof-gpu-11378
05/30/2023, 8:17 AMhundreds-airline-29192
05/31/2023, 6:55 AMaloof-gpu-11378
05/31/2023, 6:56 AMgentle-hamburger-31302
05/31/2023, 7:54 AMhundreds-airline-29192
05/31/2023, 7:54 AMgentle-hamburger-31302
05/31/2023, 7:54 AMaws
and boto3
versiongentle-hamburger-31302
05/31/2023, 7:55 AMgentle-hamburger-31302
05/31/2023, 7:55 AMhundreds-airline-29192
05/31/2023, 7:55 AMhundreds-airline-29192
05/31/2023, 7:56 AMgentle-hamburger-31302
05/31/2023, 7:56 AMFile "/tmp/datahub/ingest/venv-gcs-0.10.2.3/lib/python3.10/site-packages/datahub/ingestion/source/s3/source.py", line 726, in s3_browser
for obj in bucket.objects.filter(Prefix=prefix).page_size(PAGE_SIZE):
s3/sourcegentle-hamburger-31302
05/31/2023, 7:57 AMgentle-hamburger-31302
05/31/2023, 7:57 AMhundreds-airline-29192
05/31/2023, 7:58 AMgentle-hamburger-31302
05/31/2023, 7:58 AMhundreds-airline-29192
05/31/2023, 8:01 AMgentle-hamburger-31302
05/31/2023, 8:02 AMsource:
type: s3
config:
path_specs:
-
include: "<s3://pansurg-curation-raw-open-data/*.*>"
aws_config:
aws_region: xxxxxxx
aws_profile: xxxxxxxxx
env: "PROD"
profiling:
enabled: false
hundreds-airline-29192
05/31/2023, 8:03 AMhundreds-airline-29192
05/31/2023, 8:03 AMhundreds-airline-29192
05/31/2023, 8:03 AMfamous-florist-7218
05/31/2023, 1:43 PMhundreds-airline-29192
06/01/2023, 3:15 AMhundreds-airline-29192
06/01/2023, 3:16 AMhundreds-airline-29192
06/01/2023, 3:18 AMdazzling-judge-80093
06/01/2023, 6:52 AMhundreds-airline-29192
06/01/2023, 6:53 AMhundreds-airline-29192
06/05/2023, 8:26 AMaloof-gpu-11378
06/05/2023, 8:30 AMaws s3
command list files from this gcs bucket?aloof-gpu-11378
06/05/2023, 8:30 AMhundreds-airline-29192
06/05/2023, 8:32 AMhundreds-airline-29192
06/05/2023, 8:54 AMgentle-hamburger-31302
06/05/2023, 8:57 AMaws s3 ls --recursive s3://<bucket-name>/
hundreds-airline-29192
06/07/2023, 6:19 AMaloof-gpu-11378
06/07/2023, 6:30 AMhundreds-airline-29192
06/07/2023, 6:33 AMhundreds-airline-29192
06/07/2023, 6:33 AMhundreds-airline-29192
06/07/2023, 6:33 AMaloof-gpu-11378
06/07/2023, 6:35 AMdazzling-judge-80093
06/07/2023, 6:56 AMhundreds-airline-29192
06/07/2023, 7:00 AMhundreds-airline-29192
06/09/2023, 4:37 AM