bland-sundown-49496
09/09/2022, 7:24 PMbulky-soccer-26729
09/09/2022, 8:31 PMhelpful-optician-78938
09/09/2022, 9:30 PMsource
section of your yaml. Could you run in debug mode: datahub --debug ingest -c <your_recipe.yml>
and share the logs?bland-sundown-49496
09/11/2022, 1:37 AM(base) HGOPU-MAC:datahub hgopu$ datahub --debug ingest -c s3-datahub.yaml
[2022-09-10 20:31:53,322] DEBUG {datahub.telemetry.telemetry:210} - Sending init Telemetry
[2022-09-10 20:31:53,656] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-10 20:31:53,839] INFO {datahub.cli.ingest_cli:183} - DataHub CLI version: 0.8.44.1
[2022-09-10 20:31:53,843] DEBUG {datahub.cli.ingest_cli:195} - Using config: {'source': {'type': 's3', 'config': {'platform': 's3', 'path_spec': {'include': '<s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset>'}, 'aws_config': {'aws_access_key_id': 'XXX', 'aws_secret_access_key': 'XXXX', 'aws_region': 'us-east-1'}, 'env': 'PROD', 'profiling': {'enabled': False}}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://localhost:8080>'}}}
[2022-09-10 20:31:53,894] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-10 20:31:53,894] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-10 20:31:53,894] DEBUG {datahub.ingestion.run.pipeline:174} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured
[2022-09-10 20:31:53,894] INFO {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2022-09-10 20:31:53,905] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-10 20:31:53,906] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-10 20:31:53,906] DEBUG {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:120} - Ingestion source urn = urn:li:dataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5
[2022-09-10 20:31:53,907] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urn:li:dataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] s3\", \"type\": \"s3\", \"platform\": \"urn:li:dataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"<s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset>\\\"}, \\\"aws_config\\\": {\\\"aws_access_key_id\\\": \\\"XXXX\\\", \\\"aws_secret_access_key\\\": \\\"XXX\\\", \\\"aws_region\\\": \\\"us-east-1\\\"}, \\\"env\\\": \\\"PROD\\\", \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"<http://localhost:8080>\\\"}}}\", \"version\": \"0.8.44.1\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' '<http://localhost:8080/aspects?action=ingestProposal>'
[2022-09-10 20:31:53,930] DEBUG {datahub.ingestion.run.pipeline:269} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured.
[2022-09-10 20:31:54,229] INFO {numexpr.utils:159} - NumExpr defaulting to 8 threads.
[2022-09-10 20:31:54,501] ERROR {logger:26} - Please set env variable SPARK_VERSION
[2022-09-10 20:31:54,501] INFO {logger:27} - Using deequ: com.amazon.deequ:deequ:1.2.2-spark-3.0
[2022-09-10 20:31:54,814] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-10 20:31:55,180] DEBUG {datahub.entrypoints:168} - File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__
131 def __init__(
132 self,
133 config: PipelineConfig,
134 dry_run: bool = False,
135 preview_mode: bool = False,
136 preview_workunits: int = 10,
137 report_to: Optional[str] = None,
138 no_default_report: bool = False,
139 ):
(...)
192 self._record_initialization_failure(e, "Failed to create source")
193 return
194
195 try:
--> 196 self.source: Source = source_class.create(
197 self.config.source.dict().get("config", {}), self.ctx
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/s3/source.py", line 321, in create
319 @classmethod
320 def create(cls, config_dict, ctx):
--> 321 config = DataLakeSourceConfig.parse_obj(config_dict)
322
File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj
File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__
ValidationError: 1 validation error for DataLakeSourceConfig
path_spec -> __root__
file type specified () in path_spec.include is not in specified file types. Please select one from ['csv', 'tsv', 'json', 'parquet', 'avro'] or specify ".*" to allow all types (type=value_error)
The above exception was the direct cause of the following exception:
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run
112 def run(
113 ctx: click.Context,
114 config: str,
115 dry_run: bool,
116 preview: bool,
117 strict_warnings: bool,
118 preview_workunits: int,
119 suppress_error_logs: bool,
120 test_source_connection: bool,
121 report_to: str,
122 no_default_report: bool,
123 no_spinner: bool,
124 ) -> None:
(...)
192 _test_source_connection(report_to, pipeline_config)
193
194 try:
195 logger.debug(f"Using config: {pipeline_config}")
--> 196 pipeline = Pipeline.create(
197 pipeline_config,
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
306 def create(
307 cls,
308 config_dict: dict,
309 dry_run: bool = False,
310 preview_mode: bool = False,
311 preview_workunits: int = 10,
312 report_to: Optional[str] = None,
313 no_default_report: bool = False,
314 raw_config: Optional[dict] = None,
315 ) -> "Pipeline":
316 config = PipelineConfig.from_dict(config_dict, raw_config)
--> 317 return cls(
318 config,
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
131 def __init__(
132 self,
133 config: PipelineConfig,
134 dry_run: bool = False,
135 preview_mode: bool = False,
136 preview_workunits: int = 10,
137 report_to: Optional[str] = None,
138 no_default_report: bool = False,
139 ):
(...)
198 )
199 logger.debug(f"Source type:{source_type},{source_class} configured")
200 <http://logger.info|logger.info>("Source configured successfully.")
201 except Exception as e:
--> 202 self._record_initialization_failure(
203 e, f"Failed to configure source ({source_type})"
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
128 def _record_initialization_failure(self, e: Exception, msg: str) -> None:
--> 129 raise PipelineInitError(msg) from e
---- (full traceback above) ----
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run
pipeline = Pipeline.create(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
return cls(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
self._record_initialization_failure(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
raise PipelineInitError(msg) from e
PipelineInitError: Failed to configure source (s3)
[2022-09-10 20:31:55,180] DEBUG {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py
[2022-09-10 20:31:55,180] DEBUG {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 12:59:45)
[Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit
[2022-09-10 20:31:55,180] DEBUG {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
bland-sundown-49496
09/11/2022, 1:39 AMbland-sundown-49496
09/12/2022, 4:19 PM(base) HGOPU-MAC:datahub hgopu$ datahub --debug ingest -c s3-datahub.yaml
[2022-09-12 111602,387] DEBUG {datahub.telemetry.telemetry:210} - Sending init Telemetry
[2022-09-12 111602,750] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-12 111602,916] INFO {datahub.cli.ingest_cli:183} - DataHub CLI version: 0.8.44.1
[2022-09-12 111602,919] DEBUG {datahub.cli.ingest_cli:195} - Using config: {'source': {'type': 's3', 'config': {'platform': 's3', 'path_spec': {'include': 's3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*'}, 'profiling': {'enabled': False}}}, 'sink': {'type': 'datahub-rest', 'config': {'server': 'http://localhost:8080'}}}
[2022-09-12 111602,951] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-12 111602,951] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-12 111602,951] DEBUG {datahub.ingestion.run.pipeline:174} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured
[2022-09-12 111602,951] INFO {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://localhost:8080
[2022-09-12 111602,957] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-12 111602,957] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-12 111602,958] DEBUG {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:120} - Ingestion source urn = urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5
[2022-09-12 111602,958] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] s3\", \"type\": \"s3\", \"platform\": \"urnlidataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*\\\"}, \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://localhost:8080\\\"}}}\", \"version\": \"0.8.44.1\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal'
[2022-09-12 111602,973] DEBUG {datahub.ingestion.run.pipeline:269} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured.
[2022-09-12 111603,136] INFO {numexpr.utils:159} - NumExpr defaulting to 8 threads.
[2022-09-12 111603,304] ERROR {logger:26} - Please set env variable SPARK_VERSION
[2022-09-12 111603,304] INFO {logger:27} - Using deequ: com.amazon.deequdeequ1.2.2-spark-3.0
[2022-09-12 111603,539] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-12 111603,714] DEBUG {datahub.ingestion.run.pipeline:199} - Source type:s3,<class 'datahub.ingestion.source.s3.source.S3Source'> configured
[2022-09-12 111603,714] INFO {datahub.ingestion.run.pipeline:200} - Source configured successfully.
[2022-09-12 111603,716] INFO {datahub.cli.ingest_cli:130} - Starting metadata ingestion
-[2022-09-12 111603,720] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:s3-2022_09_12-11_16_02", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestInput", "aspect": {"value": "{\"task\": \"CLI Ingestion\", \"args\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*\\\"}, \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://localhost:8080\\\"}}}\", \"version\": \"0.8.44.1\"}, \"executorId\": \"__datahub_cli_\", \"source\": {\"type\": \"CLI_INGESTION_SOURCE\", \"ingestionSource\": \"urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5\"}, \"requestedAt\": 1662999363718}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal'
[2022-09-12 111603,756] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:s3-2022_09_12-11_16_02", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestResult", "aspect": {"value": "{\"status\": \"UNKNOWN\", \"report\": \"{\\n \\\"source\\\": {\\n \\\"type\\\": \\\"s3\\\",\\n \\\"report\\\": {\\n \\\"events_produced\\\": \\\"0\\\",\\n \\\"events_produced_per_sec\\\": \\\"0\\\",\\n \\\"event_ids\\\": [],\\n \\\"warnings\\\": {},\\n \\\"failures\\\": {},\\n \\\"filtered\\\": [],\\n \\\"start_time\\\": \\\"2022-09-12 111603.538821 (now).\\\",\\n \\\"running_time\\\": \\\"0.22 seconds\\\"\\n }\\n },\\n \\\"sink\\\": {\\n \\\"type\\\": \\\"datahub-rest\\\",\\n \\\"report\\\": {\\n \\\"total_records_written\\\": \\\"0\\\",\\n \\\"records_written_per_second\\\": \\\"0\\\",\\n \\\"warnings\\\": [],\\n \\\"failures\\\": [],\\n \\\"start_time\\\": \\\"2022-09-12 111602.065155 (1.69 seconds ago).\\\",\\n \\\"current_time\\\": \\\"2022-09-12 111603.755768 (now).\\\",\\n \\\"total_duration_in_seconds\\\": \\\"1.69\\\",\\n \\\"gms_version\\\": \\\"v0.8.44\\\",\\n \\\"pending_requests\\\": \\\"0\\\"\\n }\\n }\\n}\", \"startTimeMs\": 1662999362958, \"durationMs\": 797}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal'
[2022-09-12 111603,769] INFO {datahub.cli.ingest_cli:137} - Source (s3) report:
{'events_produced': '0',
'events_produced_per_sec': '0',
'event_ids': [],
'warnings': {},
'failures': {},
'filtered': [],
'start_time': '2022-09-12 111603.538821 (now).',
'running_time': '0.23 seconds'}
[2022-09-12 111603,769] INFO {datahub.cli.ingest_cli:140} - Sink (datahub-rest) report:
{'total_records_written': '0',
'records_written_per_second': '0',
'warnings': [],
'failures': [],
'start_time': '2022-09-12 111602.065155 (1.7 seconds ago).',
'current_time': '2022-09-12 111603.769114 (now).',
'total_duration_in_seconds': '1.7',
'gms_version': 'v0.8.44',
'pending_requests': '0'}
[2022-09-12 111603,988] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-12 111604,377] DEBUG {datahub.entrypoints:168} - File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main
146 def main(**kwargs):
147 # This wrapper prevents click from suppressing errors.
148 try:
--> 149 sys.exit(datahub(standalone_mode=False, **kwargs))
150 except click.exceptions.Abort:
..................................................
kwargs = {}
datahub = <Group datahub>
click.exceptions.Abort = <class 'click.exceptions.Abort'>
..................................................
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in call
1126 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any:
(...)
--> 1128 return self.main(*args, **kwargs)
..................................................
self = <Group datahub>
args = ()
t.Any = typing.Any
kwargs = {'standalone_mode': False}
..................................................
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
290 def wrapper(*args: Any, **kwargs: Any) -> Any:
(...)
343 "status": "error",
344 "error": get_full_class_name(e),
345 },
346 )
--> 347 raise e
..................................................
args = (<click.core.Context object at 0x7f8ee3f4bee0>, )
Any = typing.Any
kwargs = {'config': 's3-datahub.yaml',
'dry_run': False,
'preview': False,
'preview_workunits': 10,
'strict_warnings': False,
'suppress_error_logs': False,
'test_source_connection': False,
'report_to': 'datahub',
'no_default_report': False,
'no_spinner': False}
..................................................
---- (full traceback above) ----
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main
sys.exit(datahub(standalone_mode=False, **kwargs))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in call
return self.main(*args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func
return f(get_current_context(), *args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
raise e
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper
res = func(*args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 91, in wrapper
return func(*args, **kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 211, in run
loop.run_until_complete(run_func_check_upgrade(pipeline))
File "/Users/hgopu/opt/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 167, in run_func_check_upgrade
ret = await the_one_future
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 158, in run_pipeline_async
return await loop.run_in_executor(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_to_completion
raise e
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 135, in run_pipeline_to_completion
pipeline.run()
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 348, in run
for wu in itertools.islice(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/s3/source.py", line 728, in get_workunits
assert self.source_config.path_specs
AssertionError
[2022-09-12 111604,380] DEBUG {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py
[2022-09-12 111604,380] DEBUG {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 125945)
[Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit
[2022-09-12 111604,380] DEBUG {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}bland-sundown-49496
09/12/2022, 4:23 PM