:question: Question(s) of the Week: How difficult ...
# random
l
โ“ Question(s) of the Week: How difficult is it for you/your team to upgrade DataHub? What would make this process easier? @big-carpet-38439 and I are excited for your feedback -- leave your comments in the ๐Ÿงต and weโ€™ll pick a random swag winner next Monday, July 11th ๐Ÿฆ 
excited 2
m
I have been caught in the past with minor glitches when upgrading to the newest/fresh-out-the-oven release of the Helm chart (or upgrade to the newest Datahub when the Helm chart was not ready!). Glitches when the release has been cut only a few hours ago is not that "unexpected" when having a regular release cycle, but I always make sure to release first to my staging env. As for the Helm chart being in-sync with Datahub, I know in the past the chart was often updated after the fact and I would like to see it being updated BEFORE the release is even announced if it isn't the case already. Another thing I'd like to see is a compatibility chart between the Helm chart and k8s version, and the road ahead. When using the Datahub helm chart, I see deprecation notices and I have no clue when or if you are planning to fix those and which kubectl version I should be using. Having such a roadmap would help us plan the upgrades of our k8s infrastructure since infra upgrades always require good planning and testing. As for the data stores (neo4j, ES, RDBMS) upgrades, as long as those are handled by the helm chart, I'm happy ๐Ÿ˜ƒ
teamwork 1
b
After rolling out an older version of Datahub to our users that users have been trying out for a couple of months, I found myself having to test through the newer functions and trying to explain what has changed to the users. Particularly the changes in the edit ACL part, where i had to undo some policies which i implemented previously and then explain to the users what they can and cannot do now. (And I expect my implemented policies to change yet again with the new UI functionality for tags๐Ÿฅด) Better "basic user" guide of Datahub would be appreciated, most of the documentation are targeted at the developers and data engineers, like, for instance, the "Schema Blame" feature would be totally confusing to a normal user.
โž• 3
teamwork 1
i
The fork model make so that when we decide to updated there is a bunch of conflicts that emerge and need to be solved. This emerge from the fact that one usually takes the tagged git branch and merged it onto the forked master. There is no way (to the best of my knowledge) to reapply the conflict solution when merging from a new branch (although, this process is made extreamely easy by
git rerere
when instead one merges from a branch where conflict happened in the past. That feature of git solves conflicts automatically.) Now, it might useful for you Acrylic folks to keep a release branch that points to the latest version tag. This would make the life easier for maintainers of fork because only the newest conflict would need to be dealt with by hand, the previously encountered would be patched by git automatically. Finally, in our experience test that are successful on your github actions fail on ours sometime rather haphazardously. Thus, it adds additional patching to be performed on each update, if only just to tell that some test are expected to fail, and so to ignore they outcome.
teamwork 1
w
One of the problems is the all-at-once upgrade model for the backend and about 6 different connectors we are using at this moment (connector types, connector instances we have many of them in prod). That is, when we do an upgrade in dev we upgrade all the connectors and if there is any issue with one of the connectors, then we block the upgrade for next environemnts. That make us very hard to scale in terms of connector types. Instead we would like to gradually upgrade: first backend, then eg hive connectors, then kafka connectors, and so on. Is that possible? Is there a compatibility matrix for the connectors and backend? Also related is the fixing strategy. If we upgrade to eg version 0.8.30 and then we found an issue that is fixed in version 0.8.40, then we need to upgrade to version 0.8.40 to get the fix, which could potentially bring new issues and so an infinite loop ๐Ÿ˜… Ideally, there should be a fix in version 0.8.30.1. The alternative is to manage our forked repo and apply the fix in our forked repo, but then we have the complexity of the conflicts in later rebases. Another mminor issue we have found in the past is the connectors including new features enabled by default. IMO connectors should have the same behaviour in future versions and new features should be enabled only if explicitely set.
teamwork 1
plus1 2
l
I am LOVING this feedback!! Thank you all so much for such thoughtful & detailed responses โ€” there are so many opportunities for us to tackle here!! teamwork
h
We have the same issues as Eric stated in the above replies. Would be really helpful to know the roadmap and release schedule , so we can plan ahead for such upgrades.
l
drumroll SWAG TIME!!! @modern-monitor-81461 youโ€™re the lucky winner this week! Iโ€™ll send you a DM ๐Ÿ™‚
๐ŸŽ‰ 2
๐Ÿฅณ 1
b
Something actionable from this feedback -- We are working to change the language around "Blame" for schema history as a direct result of the feedback here. cc @echoing-airport-49548!
๐Ÿ‘ 1
thank you 1
Finally, in our experience test that are successful on your github actions fail on ours sometime rather haphazardously. Thus, it adds additional patching to be performed on each update, if only just to tell that some test are expected to fail, and so to ignore they outcome.
Flaky tests has unfortunately been an ongoing challenge we've been battling the past few months in particular. We hear you loud and clear and are working continuously to improve the stability of the tests
b
it wont be easy to make Blame user-friendly, but please try to make it easier for non-data engineers to understand what is schema version history ๐Ÿ˜›
g
Hopefully it's not too late, but one thing that tends to bite us is from doing git merges with the upstream since we have a forked version of datahub. Usually we don't touch 99% of the server side of the platform, but one core workflow is from adding the wiring up of new entity types that are unique to our company. And some of the changes to graphql core lead to weird errors that lead to us scratching our head for a bit till we do a more detailed diff on those specific files. Now we do more careful reviews of the GMS graphql specific files, but it would be really nice to make that more automated in some way, esp as the code is mainly around wiring up the graphql configs to the server.
b
Yeah this I 100% agree with. The GmsGraphQLEngine is a dangerous place