Hey lovely DataHub team :wave: I wanted to bring ...
# troubleshoot
b
Hey lovely DataHub team 👋 I wanted to bring to your attention an important breaking change that appears to be missing from the documentation Upon attempting to upgrade DataHub from version
0.9.3
to
0.10.1
, we discovered that certain nodes in the lineage UI have disappeared. These nodes were not entities themselves, but rather were connected to other entities as upstream/downstream dependencies. For example in our use case as attached in the screenshot, we used s3 lineage aspect to complete the flow of hive -> s3 -> redshift, but that flow seems to be broken because in
0.10.1
, the lineage aspects seems to be missing in the lineage UI. I believe this is because of the implementation of showing an error message if the entity is not found. IMHO, this shouldn’t have impacted the nodes in the lineage UI, since the original redshift ingestion is still offloading the related s3 upstream lineage aspect without the entity itself. TIA for your future efforts looking at this thankyou
👀 1
fyi @delightful-sugar-63810
@bulky-soccer-26729 Sorry in advance for tagging you here directly, but it could be related to this pr of yours bowdown
👆 there is smth about the resolution of the image above, so uploading again
a
Hi @bulky-grass-52762, thanks for your patience here- we were out at a conference all last week. This is likely something to do with the ES reindexing that occurred during upgrade. Have you re-ingested these sources since updating? CC: @echoing-airport-49548
d
@astonishing-answer-96712 I think it shouldn't be. These s3 nodes are not actually existing as entities themselves and only referenced as upstreams from the redshift ingestion, and downstreams of a datajob. On the old versions, even these entities does not exists by themselves(e.g you cannot reach them via search) they were appearing in the lineage graph. After the update, these disappeared. We also tested this schenario by creating a lineage with this helper for two brand new entities as one being upstream, one being downstream. In the old version(0.9.3), you can observe the lineage edge with two new entities, while in the new version(0.10.1), you only can see the downstream on the lineage visualization.
a
Ok, I think this may be a bug then- @hundreds-photographer-13496 may be able to speak to it as well
This does not seem to be change on ingestion side but rather on frontend/gms side on how we handle non-existent entities - might be related to this PR - https://github.com/datahub-project/datahub/pull/7374
a
Hi @delightful-sugar-63810, this is a new change- previously we were showing non-entities in the UI, but they introduced errors when users interacted with them- now we’re filtering out all things not actually in DH- if you produce a DataSet key aspect for these nodes- which is a minimum footprint that will reintroduce these nodes
d
👍🏻 We already forked redshift ingestion on our side but I guess this is a breaking change and will require addition to many ingestion sources. One way going forward would be instead of adding this to ingestion sources, maybe these non existing lineage vertexes could still be visible in the lineage display. I'm not arguing that would be a better solution but I guess would be an much easier one 😄
a
I think it’s a good idea- Will discuss with the team and see if maybe we could enable an option to show them- the issue before was the creation of empty entities that showed up in other places in the UI
would be a good feature request https://feature-requests.datahubproject.io
CC: @big-carpet-38439
Hi folks - we've amended release notes to include a disclaimer around this. We are working to figure out the best way to proceed here. I think once potential solution is to show these are "greyed out" in lineage, meaning that you cannot interact with the entities that datahub does not know about. The problem we faced when leaving those visible in lineage is that when you went to view the entity you'd get the "entity does not exist" screen which is also a pretty bad user experience. Please help us work towards the optimal solution here -- what would you expect to see?