Hello Everyone I am using datahub cli version 0.9....
# troubleshoot
b
Hello Everyone I am using datahub cli version 0.9.2 to ingest a DBT lineage, using a recipe file like this
Copy code
source:
    type: "dbt"
    config:
      # Coordinates
      manifest_path: "/Users/asif/dbt_data/manifest.json"
      catalog_path: "/Users/asif/dbt_data/catalog.json"
      sources_path: "/Users/asif/dbt_data/sources.json"
  
      # Options
      target_platform: "snowflake" # e.g. bigquery/postgres/etc.
      platform_instance: "snowflake-1" # The instance of the platform that all assets produced by this recipe belong to
  sink:
    type: datahub-rest # default datahub-rest
    config:
      server: "<https://localhost:9002/api/gms>"
      extra_headers:
      token: xxxxx
  
  transformers:
    - type: "simple_add_dataset_properties"
      config:
        semantics: OVERWRITE
        properties:
          prop1: value1
          prop2: value2
But I keep getting this error
Copy code
File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 142, in run_pipeline_async
    return await loop.run_in_executor(
  File "/usr/local/Cellar/python@3.9/3.9.15/Frameworks/Python.framework/Versions/3.9/lib/python3.9/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 133, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.9/site-packages/datahub/cli/ingest_cli.py", line 125, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/run/pipeline.py", line 376, in run
    for record_envelope in self.transform(
  File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/transformer/base_transformer.py", line 217, in transform
    transformed_aspect = self.transform_aspect(
  File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/transformer/add_dataset_properties.py", line 95, in transform_aspect
    assert in_dataset_properties_aspect
AssertionError
Can someone help what could be causing this?
a
Hi Asif, is this happening for every DBT lineage you ingest or just one specific one?
b
Hi Paul, This happens whenever I put any of these two blocks under transformers
Copy code
- type: "simple_add_dataset_properties"
      config: 
        properties:
          newprop1: newvalue1
          newprop2: newvalue2
OR this:
Copy code
- type: "pattern_add_dataset_schema_terms"
      config: 
        term_pattern:
          rules:
            ".*email.*": ["urn:li:glossaryTerm:Email"]
            ".*id.*": ["urn:li:glossaryTerm:PKey"]
a
Huh, that’s interesting- any idea what could be going on here @gray-shoe-75895?
b
it works on datahub cli 0.8.39 but not on 0.8.45 and onwards.
g
This was fixed today by this PR https://github.com/datahub-project/datahub/pull/6429, and will be included in the next release of acryl-datahub
a side note - 0.8.39 is a fairly old version. I’d recommend upgrading (after the release) to take advantage of the latest features and improvements
b
thanks Harshal and Paul.
@gray-shoe-75895 can you specify which version will have the above fix? I am also seeing this exception
Copy code
[2022-11-15 18:13:38,168] ERROR    {datahub.entrypoints:206} - Command failed: 'NoneType' object has no attribute 'fields'
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/datahub/entrypoints.py", line 164, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 347, in wrapper
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 299, in wrapper
    res = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/utilities/memory_leak_detector.py", line 95, in wrapper
    return func(ctx, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 192, in run
    loop.run_until_complete(run_func_check_upgrade(pipeline))
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 151, in run_func_check_upgrade
    ret = await the_one_future
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 142, in run_pipeline_async
    return await loop.run_in_executor(
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 133, in run_pipeline_to_completion
    raise e
  File "/usr/local/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 125, in run_pipeline_to_completion
    pipeline.run()
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 376, in run
    for record_envelope in self.transform(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/transformer/base_transformer.py", line 194, in transform
    for envelope in record_envelopes:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/transformer/base_transformer.py", line 194, in transform
    for envelope in record_envelopes:
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/transformer/base_transformer.py", line 194, in transform
    for envelope in record_envelopes:
  [Previous line repeated 2 more times]
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/transformer/base_transformer.py", line 218, in transform
    transformed_aspect = self.transform_aspect(
  File "/usr/local/lib/python3.10/site-packages/datahub/ingestion/transformer/add_dataset_schema_terms.py", line 126, in transform_aspect
    for field in schema_metadata_aspect.fields
AttributeError: 'NoneType' object has no attribute 'fields'
when I put a transformer like this
Copy code
- type: "pattern_add_dataset_schema_terms"
    config:
      term_pattern:
        rules:
          ".*email.*": ["urn:li:glossaryTerm:Email"]
          ".*id.*": ["urn:li:glossaryTerm:PKey"]
g
Actually based on that stack trace, this looks like a slightly different bug. It will be fixed by this PR https://github.com/datahub-project/datahub/pull/6445