Hello guys, is it possible to run datahub without ...
# getting-started
b
Hello guys, is it possible to run datahub without elasticsearch or is it part of the core functionality? I know it's likely not officially supported to run this without elasticsearch, just wondering if anyone knows how dependent datahub is on elasticsearch, when going the neo4j route, is elasticsearch only used for the search functionality, so with some code modifications, would it be possible to disable the search and thus eliminate the elasticsearch requirement? Maybe someone familiar with datahubs internal workings can speed things up, have been going over the code, but still struggling to see the full picture.
🔍 1
đź“– 1
l
Hey there 👋 I'm The DataHub Community Support bot. I'm here to help make sure the community can best support you with your request. Let's double check a few things first: ✅ There's a lot of good information on our docs site: www.datahubproject.io/docs, Have you searched there for a solution? ✅ button ✅ It's not uncommon that someone has run into your exact problem before in the community. Have you searched Slack for similar issues? ✅ button Did you find a solution to your issue? ❌ Sorry you weren't able to find a solution. I'm sending you some tips on info you can provide to help the community troubleshoot. Whenever you feel your issue is solved, please react ✅ to your original message to let us know!
b
From this diagram, it seems that elasticsearch is only required for search, but it's not an official diagram (found it on google):
a
Hi, it’s definitely possible to run with neo4j instead, although it isn’t our recommended implementation
I believe that GMS/MAE do rely on the ES pod to some extent to do things like resolving lineage etc, but @dazzling-yak-93039 might be able to provide more insight
b
But is it possible to run entirely without es, because when I simply disable ES (docker-compose), then gms will stop working
I might be able to work around it, so i'm asking if there is any core functionality dependent on ES, besides search (in neo4j implementation)
The main issue currently is, that there's a critical vulnerability in ES (7+8), while it is not exploitable, it prevents deployments to google marketplace and while it seems to be possible to fix this vulnerability in ES8, I doubt it will be possible in ES7 (it's approaching EOL). I'm wondering if additionally, you're planning to migrate to ES8 anytime soon?
Thanks for your help thou, I'll be waiting for @dazzling-yak-93039 to reply, it would help me a lot trying to understand if what I'm trying to achieve is impossible or not, I need to entirely eliminate the ES, even if that is at the expense of losing the search functionality. But from what I understand, there might be some other functionality dependant on ES, so it will not be possible to disable it. (Even it it requires some code changes)
d
Hi, I believe there is some other functionality dependent on ES (I don't remember exactly what it is right now but I can find out if needed!). So I believe even the implementation that uses neo4j as the graph service still uses ES. So, unfortunately it is needed right now.
I'm not sure about the upgrade timeline but I can also find that out if needed!
b
It would be nice to know what exactly is dependent on ES, but I don’t want to waste your time, I think this information is already enough for me. If you happen to remember what depends on this besides search, let me know,
@dazzling-yak-93039 someone on the team would like to know what exactly is depending on ES besides search (in neo4j version), so they can evaluate next steps, it would be much appreciated if you could find out.
Also, it seems to be possible to run ES7 client with ES8 server, so it might be possible to make this work with ES8 instead: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-compatibility.html
@astonishing-answer-96712 @dazzling-yak-93039 I'm currently trying to get DataHub running with ES8, which should theoretically work according to the ES documentation. I've been looking at the DataHub code and try to understand how to enable the compatibility mode as described here: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-compatibility.html, but there is no line like
RestHighLevelClient esClient = new RestHighLevelClientBuilder(restClient)
to be found anywhere in the code, in fact there is no reference to
RestHighLevelClientBuilder
at all, so I'm not sure where to set
.setApiCompatibilityMode(true)
. I'd highly appreciate if someone could point me to the correct location or point out if this is simply not possible.
m
hey @dazzling-yak-93039 @astonishing-answer-96712 this is also a matter for me, can you provide support without ES as a whole or ETA for ES8?
b
While some functionality seems to be broken (there are errors in the logs regarding ES8), in general it seems to work to run with a patched ES 8.7.1 image (where snakeyaml vuln has been patches), if we could somehow get the api compatibility mode running, it would be perfect, but I don't see how this will be possible. I have been able to run ingestions on both the neo4j and without neo4j version (oddly enough only on intel mac, didn't work on m2, but that might be an issue on my end, didn't investigate further, due to time constraints
m
Would like also a possibility to disable elasticsearch dependency as datahub can serve only as fetching data from databases (running ingestions
m
b
What is required is the RestHighLevelClientBuilder to enable the compatibility mode, unless there is a way to set this afterwards, I couldn’t find a way to do it.
m
Curious, are you planning to deploy the whole stack to Marketplace? Is there an option to use external ES cluster instead of the image in the helm chart
m
@modern-artist-55754 elasticsearch must be provided in the stack
m
@mysterious-table-75773 i see the elastic jar.
a
CC: @brainy-tent-14503 here