Hello, I found some issues when running 0.8.40 ver...
# troubleshoot
q
Hello, I found some issues when running 0.8.40 version dbt connector. To give some context: We have dbt workflows for Snowflake tables. Snowflake tables are ingested independently by Snowflake connector. We use only catalog and manifest yaml files for dbt connector. Now what are the issues: 1. If I run with
disable_dbt_node_creation
set to True - I can see nice lineage between preingested Snowflake tables but on the main page where all platforms are shown I can see DBT platform with count of several thousand elements. If I click on this platform to see entities I got an exception. After some examination of mysql database I could see there are objects with urn like
urn:li:assertion:2c8a2605354d9b924c0f1b5d9f0dffd5
with dataPlatformInstance apsect having
dbt
as platform but nothing as an instance (I believe exception was coming from that aspect missing platform instance). 2. If I run with
disable_dbt_node_creation
set to False - I can see lineage and dbt objects combined with Snowflake tables (very cool). It seems I still have above assertions but they don't cause problems on platform search anymore. In either case if I run connector with
stateful_ingestion
enabled I end up with connector ingesting data but then throwing an exception ending with code like below:
Copy code
File "/usr/local/lib/python3.9/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 35, in _get_lightweight_repr
    31   def _get_lightweight_repr(dataset_urn: str) -> str:
    32       """Reduces the amount of text in the URNs for smaller state footprint."""
    33       SEP = BaseSQLAlchemyCheckpointState._get_separator()
    34       key = dataset_urn_to_key(dataset_urn)
--> 35       assert key is not None
    36       return f"{key.platform}{SEP}{key.name}{SEP}{key.origin}"
    ..................................................
     dataset_urn = 'urn:li:assertion:2c8aaaa5354d9b924c0f1b5c9f09bf75'
     SEP = '||'
     key = None
Which makes me think urn representation function fails for assertion objects which are considered to be datasets somehow? Anyone having similar problems?
Also I can open validation tab of a dataset only if there is no data there, if I there are some tests/assertions saved for a dataset clicking on validation tab leads to a blank page.
g
this second issue has been resolved in a recent PR Piotr
i would stick to
disable_dbt_node_creation: False
- that’s the recommended approach anyway
we’ll look into why assertions are causing problems for stateful ingestion and disabled dbt mode
m
Hi Piotr, we have a fix underway for the stateful ingestion problem
q
Thank you for fixing this issue, is there any PR I can watch to know once it is fixed?
l
Hi @quick-pizza-8906! Here’s the PR that is addressing dbt stateful ingestion - https://github.com/acryldata/datahub-fork/pull/769
q
@little-megabyte-1074 thank you
a
Hi @little-megabyte-1074 (and everyone) we have the same issue (reference of this thread). Is the above fix under the latest release? Using version
0.8.41
but still getting the same
AssertionError
with the stateful_ingestion enabled. Also can not view the above PR coming from the fork repo (getting a 404). Would appreciate a solid reply from someone here 🙏 since making the stateful-ingestion to work for the DBT source is an absolutely crucial topic and game-changer for us. Big thanks beforehand PS. happy to provide more input/details on our case
l
Hi @adamant-van-21355 - it looks like the PR I mentioned above was closed in favor of this one. We have members of our team actively working on review today & we are aiming to get it merged in today/tomorrow morning. We have a release scheduled for tomorrow & this should be in it!
a
perfect, thanks a lot 🙏