Hello, I have started setting up Datahub in my loc...
# getting-started
b
Hello, I have started setting up Datahub in my local MAC and trying to configure S3 as a data source. I am getting the below error message. I am able to list the bucket from AWS CLI succesfully. Would someone please help me? Here is the my S3 source yaml. source: type: "s3" config: platform: s3 path_spec: include: "s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset" aws_config: aws_access_key_id: XXX aws_secret_access_key: XXX aws_region: us-east-1 env: "PROD" profiling: enabled: false # see https://datahubproject.io/docs/metadata-ingestion/sink_docs/file for complete documentation sink: type: "datahub-rest" config: server: "http://localhost:8080" ERROR MESSAGE WHEN I RAN: datahub --debug ingest -c s3-datahub.yaml File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run pipeline = Pipeline.create( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create return cls( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in init self._record_initialization_failure( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure raise PipelineInitError(msg) from e PipelineInitError: Failed to configure source (s3) [2022-09-09 142150,735] DEBUG {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py [2022-09-09 142150,735] DEBUG {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 125945) [Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit [2022-09-09 142150,735] DEBUG {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
b
hey Harsha! would you mind posting this in #ingestion to increase the chances of getting a good response? also when posting stack traces or code blocks we ask that you post them as a thread to a top level question to keep the channel a little cleaner and easier to read
h
Hi @bland-sundown-49496, looks like there is an issue with
source
section of your yaml. Could you run in debug mode:
datahub --debug ingest -c <your_recipe.yml>
and share the logs?
b
Thanks @helpful-optician-78938 for the response. Here is the debug log. Please note I have replaced the s3 keys in the log intentionally. Thanks, Harsha
Copy code
(base) HGOPU-MAC:datahub hgopu$ datahub --debug  ingest -c s3-datahub.yaml
[2022-09-10 20:31:53,322] DEBUG    {datahub.telemetry.telemetry:210} - Sending init Telemetry
[2022-09-10 20:31:53,656] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-10 20:31:53,839] INFO     {datahub.cli.ingest_cli:183} - DataHub CLI version: 0.8.44.1
[2022-09-10 20:31:53,843] DEBUG    {datahub.cli.ingest_cli:195} - Using config: {'source': {'type': 's3', 'config': {'platform': 's3', 'path_spec': {'include': '<s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset>'}, 'aws_config': {'aws_access_key_id': 'XXX', 'aws_secret_access_key': 'XXXX', 'aws_region': 'us-east-1'}, 'env': 'PROD', 'profiling': {'enabled': False}}}, 'sink': {'type': 'datahub-rest', 'config': {'server': '<http://localhost:8080>'}}}
[2022-09-10 20:31:53,894] DEBUG    {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-10 20:31:53,894] DEBUG    {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-10 20:31:53,894] DEBUG    {datahub.ingestion.run.pipeline:174} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured
[2022-09-10 20:31:53,894] INFO     {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to <http://localhost:8080>
[2022-09-10 20:31:53,905] DEBUG    {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config
[2022-09-10 20:31:53,906] DEBUG    {datahub.ingestion.sink.datahub_rest:127} - Setting gms config
[2022-09-10 20:31:53,906] DEBUG    {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:120} - Ingestion source urn = urn:li:dataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5
[2022-09-10 20:31:53,907] DEBUG    {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to:
curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urn:li:dataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] s3\", \"type\": \"s3\", \"platform\": \"urn:li:dataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"<s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset>\\\"}, \\\"aws_config\\\": {\\\"aws_access_key_id\\\": \\\"XXXX\\\", \\\"aws_secret_access_key\\\": \\\"XXX\\\", \\\"aws_region\\\": \\\"us-east-1\\\"}, \\\"env\\\": \\\"PROD\\\", \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"<http://localhost:8080>\\\"}}}\", \"version\": \"0.8.44.1\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' '<http://localhost:8080/aspects?action=ingestProposal>'
[2022-09-10 20:31:53,930] DEBUG    {datahub.ingestion.run.pipeline:269} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured.
[2022-09-10 20:31:54,229] INFO     {numexpr.utils:159} - NumExpr defaulting to 8 threads.
[2022-09-10 20:31:54,501] ERROR    {logger:26} - Please set env variable SPARK_VERSION
[2022-09-10 20:31:54,501] INFO     {logger:27} - Using deequ: com.amazon.deequ:deequ:1.2.2-spark-3.0
[2022-09-10 20:31:54,814] DEBUG    {datahub.telemetry.telemetry:243} - Sending Telemetry
[2022-09-10 20:31:55,180] DEBUG    {datahub.entrypoints:168} - File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 196, in __init__
    131  def __init__(
    132      self,
    133      config: PipelineConfig,
    134      dry_run: bool = False,
    135      preview_mode: bool = False,
    136      preview_workunits: int = 10,
    137      report_to: Optional[str] = None,
    138      no_default_report: bool = False,
    139  ):
 (...)
    192          self._record_initialization_failure(e, "Failed to create source")
    193          return
    194
    195      try:
--> 196          self.source: Source = source_class.create(
    197              self.config.source.dict().get("config", {}), self.ctx

File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/s3/source.py", line 321, in create
    319  @classmethod
    320  def create(cls, config_dict, ctx):
--> 321      config = DataLakeSourceConfig.parse_obj(config_dict)
    322

File "pydantic/main.py", line 578, in pydantic.main.BaseModel.parse_obj

File "pydantic/main.py", line 406, in pydantic.main.BaseModel.__init__

ValidationError: 1 validation error for DataLakeSourceConfig
path_spec -> __root__
  file type specified () in path_spec.include is not in specified file types. Please select one from ['csv', 'tsv', 'json', 'parquet', 'avro'] or specify ".*" to allow all types (type=value_error)

The above exception was the direct cause of the following exception:

File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run
    112  def run(
    113      ctx: click.Context,
    114      config: str,
    115      dry_run: bool,
    116      preview: bool,
    117      strict_warnings: bool,
    118      preview_workunits: int,
    119      suppress_error_logs: bool,
    120      test_source_connection: bool,
    121      report_to: str,
    122      no_default_report: bool,
    123      no_spinner: bool,
    124  ) -> None:
 (...)
    192          _test_source_connection(report_to, pipeline_config)
    193
    194      try:
    195          logger.debug(f"Using config: {pipeline_config}")
--> 196          pipeline = Pipeline.create(
    197              pipeline_config,

File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
    306  def create(
    307      cls,
    308      config_dict: dict,
    309      dry_run: bool = False,
    310      preview_mode: bool = False,
    311      preview_workunits: int = 10,
    312      report_to: Optional[str] = None,
    313      no_default_report: bool = False,
    314      raw_config: Optional[dict] = None,
    315  ) -> "Pipeline":
    316      config = PipelineConfig.from_dict(config_dict, raw_config)
--> 317      return cls(
    318          config,

File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
    131  def __init__(
    132      self,
    133      config: PipelineConfig,
    134      dry_run: bool = False,
    135      preview_mode: bool = False,
    136      preview_workunits: int = 10,
    137      report_to: Optional[str] = None,
    138      no_default_report: bool = False,
    139  ):
 (...)
    198          )
    199          logger.debug(f"Source type:{source_type},{source_class} configured")
    200          <http://logger.info|logger.info>("Source configured successfully.")
    201      except Exception as e:
--> 202          self._record_initialization_failure(
    203              e, f"Failed to configure source ({source_type})"

File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
    128  def _record_initialization_failure(self, e: Exception, msg: str) -> None:
--> 129      raise PipelineInitError(msg) from e

---- (full traceback above) ----
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 196, in run
    pipeline = Pipeline.create(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 317, in create
    return cls(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 202, in __init__
    self._record_initialization_failure(
File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 129, in _record_initialization_failure
    raise PipelineInitError(msg) from e

PipelineInitError: Failed to configure source (s3)
[2022-09-10 20:31:55,180] DEBUG    {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py
[2022-09-10 20:31:55,180] DEBUG    {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 12:59:45)
[Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit
[2022-09-10 20:31:55,180] DEBUG    {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
#ingestion
Thanks @modern-artist-55754. I correct the yaml file but this time getting different error. Any help!
(base) HGOPU-MAC:datahub hgopu$ datahub --debug  ingest -c s3-datahub.yaml
[2022-09-12 111602,387] DEBUG {datahub.telemetry.telemetry:210} - Sending init Telemetry [2022-09-12 111602,750] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry [2022-09-12 111602,916] INFO {datahub.cli.ingest_cli:183} - DataHub CLI version: 0.8.44.1 [2022-09-12 111602,919] DEBUG {datahub.cli.ingest_cli:195} - Using config: {'source': {'type': 's3', 'config': {'platform': 's3', 'path_spec': {'include': 's3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*'}, 'profiling': {'enabled': False}}}, 'sink': {'type': 'datahub-rest', 'config': {'server': 'http://localhost:8080'}}} [2022-09-12 111602,951] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config [2022-09-12 111602,951] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config [2022-09-12 111602,951] DEBUG {datahub.ingestion.run.pipeline:174} - Sink type:datahub-rest,<class 'datahub.ingestion.sink.datahub_rest.DatahubRestSink'> configured [2022-09-12 111602,951] INFO {datahub.ingestion.run.pipeline:175} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://localhost:8080 [2022-09-12 111602,957] DEBUG {datahub.ingestion.sink.datahub_rest:125} - Setting env variables to override config [2022-09-12 111602,957] DEBUG {datahub.ingestion.sink.datahub_rest:127} - Setting gms config [2022-09-12 111602,958] DEBUG {datahub.ingestion.reporting.datahub_ingestion_run_summary_provider:120} - Ingestion source urn = urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5 [2022-09-12 111602,958] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubIngestionSource", "entityUrn": "urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5", "changeType": "UPSERT", "aspectName": "dataHubIngestionSourceInfo", "aspect": {"value": "{\"name\": \"[CLI] s3\", \"type\": \"s3\", \"platform\": \"urnlidataPlatform:unknown\", \"config\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*\\\"}, \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://localhost:8080\\\"}}}\", \"version\": \"0.8.44.1\", \"executorId\": \"__datahub_cli_\"}}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-09-12 111602,973] DEBUG {datahub.ingestion.run.pipeline:269} - Reporter type:datahub,<class 'datahub.ingestion.reporting.datahub_ingestion_run_summary_provider.DatahubIngestionRunSummaryProvider'> configured. [2022-09-12 111603,136] INFO {numexpr.utils:159} - NumExpr defaulting to 8 threads. [2022-09-12 111603,304] ERROR {logger:26} - Please set env variable SPARK_VERSION [2022-09-12 111603,304] INFO {logger:27} - Using deequ: com.amazon.deequdeequ1.2.2-spark-3.0 [2022-09-12 111603,539] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry [2022-09-12 111603,714] DEBUG {datahub.ingestion.run.pipeline:199} - Source type:s3,<class 'datahub.ingestion.source.s3.source.S3Source'> configured [2022-09-12 111603,714] INFO {datahub.ingestion.run.pipeline:200} - Source configured successfully. [2022-09-12 111603,716] INFO {datahub.cli.ingest_cli:130} - Starting metadata ingestion -[2022-09-12 111603,720] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:s3-2022_09_12-11_16_02", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestInput", "aspect": {"value": "{\"task\": \"CLI Ingestion\", \"args\": {\"recipe\": \"{\\\"source\\\": {\\\"type\\\": \\\"s3\\\", \\\"config\\\": {\\\"platform\\\": \\\"s3\\\", \\\"path_spec\\\": {\\\"include\\\": \\\"s3://imo-datalake-dev-gold20201022182214781400000004/rhubarb/2022/08/29/dataset/*.*\\\"}, \\\"profiling\\\": {\\\"enabled\\\": false}}}, \\\"sink\\\": {\\\"type\\\": \\\"datahub-rest\\\", \\\"config\\\": {\\\"server\\\": \\\"http://localhost:8080\\\"}}}\", \"version\": \"0.8.44.1\"}, \"executorId\": \"__datahub_cli_\", \"source\": {\"type\": \"CLI_INGESTION_SOURCE\", \"ingestionSource\": \"urnlidataHubIngestionSource:cli-ac4e9c10b8fc815590c3d620ce80d9e5\"}, \"requestedAt\": 1662999363718}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-09-12 111603,756] DEBUG {datahub.emitter.rest_emitter:235} - Attempting to emit to DataHub GMS; using curl equivalent to: curl -X POST -H 'User-Agent: python-requests/2.27.1' -H 'Accept-Encoding: gzip, deflate, br' -H 'Accept: */*' -H 'Connection: keep-alive' -H 'X-RestLi-Protocol-Version: 2.0.0' -H 'Content-Type: application/json' --data '{"proposal": {"entityType": "dataHubExecutionRequest", "entityUrn": "urnlidataHubExecutionRequest:s3-2022_09_12-11_16_02", "changeType": "UPSERT", "aspectName": "dataHubExecutionRequestResult", "aspect": {"value": "{\"status\": \"UNKNOWN\", \"report\": \"{\\n \\\"source\\\": {\\n \\\"type\\\": \\\"s3\\\",\\n \\\"report\\\": {\\n \\\"events_produced\\\": \\\"0\\\",\\n \\\"events_produced_per_sec\\\": \\\"0\\\",\\n \\\"event_ids\\\": [],\\n \\\"warnings\\\": {},\\n \\\"failures\\\": {},\\n \\\"filtered\\\": [],\\n \\\"start_time\\\": \\\"2022-09-12 111603.538821 (now).\\\",\\n \\\"running_time\\\": \\\"0.22 seconds\\\"\\n }\\n },\\n \\\"sink\\\": {\\n \\\"type\\\": \\\"datahub-rest\\\",\\n \\\"report\\\": {\\n \\\"total_records_written\\\": \\\"0\\\",\\n \\\"records_written_per_second\\\": \\\"0\\\",\\n \\\"warnings\\\": [],\\n \\\"failures\\\": [],\\n \\\"start_time\\\": \\\"2022-09-12 111602.065155 (1.69 seconds ago).\\\",\\n \\\"current_time\\\": \\\"2022-09-12 111603.755768 (now).\\\",\\n \\\"total_duration_in_seconds\\\": \\\"1.69\\\",\\n \\\"gms_version\\\": \\\"v0.8.44\\\",\\n \\\"pending_requests\\\": \\\"0\\\"\\n }\\n }\\n}\", \"startTimeMs\": 1662999362958, \"durationMs\": 797}", "contentType": "application/json"}}}' 'http://localhost:8080/aspects?action=ingestProposal' [2022-09-12 111603,769] INFO {datahub.cli.ingest_cli:137} - Source (s3) report: {'events_produced': '0', 'events_produced_per_sec': '0', 'event_ids': [], 'warnings': {}, 'failures': {}, 'filtered': [], 'start_time': '2022-09-12 111603.538821 (now).', 'running_time': '0.23 seconds'} [2022-09-12 111603,769] INFO {datahub.cli.ingest_cli:140} - Sink (datahub-rest) report: {'total_records_written': '0', 'records_written_per_second': '0', 'warnings': [], 'failures': [], 'start_time': '2022-09-12 111602.065155 (1.7 seconds ago).', 'current_time': '2022-09-12 111603.769114 (now).', 'total_duration_in_seconds': '1.7', 'gms_version': 'v0.8.44', 'pending_requests': '0'} [2022-09-12 111603,988] DEBUG {datahub.telemetry.telemetry:243} - Sending Telemetry [2022-09-12 111604,377] DEBUG {datahub.entrypoints:168} - File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main 146 def main(**kwargs): 147 # This wrapper prevents click from suppressing errors. 148 try: --> 149 sys.exit(datahub(standalone_mode=False, **kwargs)) 150 except click.exceptions.Abort: .................................................. kwargs = {} datahub = <Group datahub> click.exceptions.Abort = <class 'click.exceptions.Abort'> .................................................. File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in call 1126 def __call__(self, *args: t.Any, **kwargs: t.Any) -> t.Any: (...) --> 1128 return self.main(*args, **kwargs) .................................................. self = <Group datahub> args = () t.Any = typing.Any kwargs = {'standalone_mode': False} .................................................. File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper 290 def wrapper(*args: Any, **kwargs: Any) -> Any: (...) 343 "status": "error", 344 "error": get_full_class_name(e), 345 }, 346 ) --> 347 raise e .................................................. args = (<click.core.Context object at 0x7f8ee3f4bee0>, ) Any = typing.Any kwargs = {'config': 's3-datahub.yaml', 'dry_run': False, 'preview': False, 'preview_workunits': 10, 'strict_warnings': False, 'suppress_error_logs': False, 'test_source_connection': False, 'report_to': 'datahub', 'no_default_report': False, 'no_spinner': False} .................................................. ---- (full traceback above) ---- File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/entrypoints.py", line 149, in main sys.exit(datahub(standalone_mode=False, **kwargs)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1128, in call return self.main(*args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/click/decorators.py", line 26, in new_func return f(get_current_context(), *args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper raise e File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper res = func(*args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/utilities/memory_leak_detector.py", line 91, in wrapper return func(*args, **kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 211, in run loop.run_until_complete(run_func_check_upgrade(pipeline)) File "/Users/hgopu/opt/anaconda3/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete return future.result() File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 167, in run_func_check_upgrade ret = await the_one_future File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 158, in run_pipeline_async return await loop.run_in_executor( File "/Users/hgopu/opt/anaconda3/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 149, in run_pipeline_to_completion raise e File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/cli/ingest_cli.py", line 135, in run_pipeline_to_completion pipeline.run() File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 348, in run for wu in itertools.islice( File "/Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/ingestion/source/s3/source.py", line 728, in get_workunits assert self.source_config.path_specs AssertionError [2022-09-12 111604,380] DEBUG {datahub.entrypoints:198} - DataHub CLI version: 0.8.44.1 at /Users/hgopu/opt/anaconda3/lib/python3.8/site-packages/datahub/__init__.py [2022-09-12 111604,380] DEBUG {datahub.entrypoints:201} - Python version: 3.8.8 (default, Apr 13 2021, 125945) [Clang 10.0.0 ] at /Users/hgopu/opt/anaconda3/bin/python3 on macOS-10.16-x86_64-i386-64bit [2022-09-12 111604,380] DEBUG {datahub.entrypoints:204} - GMS config {'models': {}, 'versions': {'linkedin/datahub': {'version': 'v0.8.44', 'commit': '2115d5bf1dc4dcfd73dbff6d41aaa08a279b62c0'}}, 'managedIngestion': {'defaultCliVersion': '0.8.42', 'enabled': True}, 'statefulIngestionCapable': True, 'supportsImpactAnalysis': True, 'telemetry': {'enabledCli': True, 'enabledIngestion': False}, 'datasetUrnNameCasing': False, 'retention': 'true', 'datahub': {'serverType': 'quickstart'}, 'noCode': 'true'}
**Please note I chopped some trace in the middle which are just function calls due to the limit in slack