Hi everyone :wave::skin-tone-3: we are using Datah...
# troubleshoot
a
Hi everyone 👋🏼 we are using Datahub on the latest version (v0.8.40) and currently ingesting data from Snowflake and DBT. We are having some issues using the
stateful ingestion
feature on DBT. Once we enable the stateful configuration we got the following stack-trace (in thread) with an assertion error, while metadata is ingested successfully. This is happening either on top of an old DBT ingestion config or on a new one after enabling the stateful ingestion with
"remove_stale_metadata": True
. I would appreciate any clues on how we can make this work properly so any stale metadata is removed on future ingestion runs 🙏
Copy code
...
"pipeline_name": "my-dbt-pipeline",

...

"stateful_ingestion": {
    "enabled": True,
    "remove_stale_metadata": True
},
...
Also what is really the role/usage of
ignore_old_state
and
ignore_new_state
? because this is not clear from the docs
Copy code
Traceback (most recent call last):
  File "./scripts/extract_dbt_metadata_files.py", line 108, in <module>
  ingest_dbt_metadata(mart_name=mart)
  File "./scripts/extract_dbt_metadata_files.py", line 82, in ingest_dbt_metadata
  pipeline.run()
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/run/pipeline.py", line 215, in run
  for wu in itertools.islice(
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/source/dbt.py", line 1368, in get_workunits
  yield from self.create_test_entity_mcps(
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/source/dbt.py", line 1138, in create_test_entity_mcps
  self.save_checkpoint(node_datahub_urn)
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/source/dbt.py", line 1525, in save_checkpoint
  checkpoint_state.add_table_urn(node_datahub_urn)
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 86, in add_table_urn
  self.encoded_table_urns.append(self._get_lightweight_repr(table_urn))
  File "/home/projects/datahub/.venv/lib/python3.8/site-packages/datahub/ingestion/source/state/sql_common_state.py", line 35, in _get_lightweight_repr
  assert key is not None
  AssertionError
l
@careful-pilot-86309 ^
w
FYI @quick-pizza-8906 ^
c
I am looking into this issue and will come back.
🙏 1
q
@careful-pilot-86309 I think this is same as https://datahubspace.slack.com/archives/C029A3M079U/p1657807603087079 - I gave a bit more info there, I think it's all about dbt assertions (not because the exception above is about python assertion, but because of the rest of stack traces visible when this problem happens)
I can provide more info on this if needed
c
Yes. I have noticed the common part and have clubbed both things together. Appreciate your feedback and help on this. I will get back with specific information i need.
a
Hey there @careful-pilot-86309 👋🏼 regarding the DBT and
stateful-ingestion
still getting the above AssertionError not being able to make the stateful ingestion to work. Is that fixed under the latest release? Using version
0.8.41
but still getting the same
AssertionError
with the stateful_ingestion enabled. Would appreciate a solid reply from someone on this please 🙏 since making the stateful-ingestion to work for the DBT source is an absolutely crucial topic and game-changer for us (and many here I guess). Big thanks beforehand and happy to provide more input/details on our case if needed
c
@gentle-hamburger-31302 Please confirm if the fix is merged? If yes, please let us know the version in which it will be available.
g
Nope it is not merge , still in review
a
Thanks for the update, is there a PR we can follow (?) and is that expected on the next release?
h
@gentle-hamburger-31302, could you point me to the PR with the fix?
g
Hi @helpful-optician-78938 Please find the PR: https://github.com/datahub-project/datahub/pull/5540