Hi team the standard helm chart which is great btw includes DataHub #all-things-deployment

Hi team - the standard helm chart (which is great ...

wonderful-dream-38059

06/07/2022, 2:46 PM

Hi team - the standard helm chart (which is great btw) includes ingress for the rest endpoint and web frontend, but not kafka. I'm getting some throughput issues with the rest endpoint for ingestion and so wanted to try using kafka as the sink for ingestion. I tried setting up ingress for Kafka but rapidly got to a stage where I couldn't get things to work. Does anyone have a working setup where the kafka ingestion endpoint is available for use while deployed in a kubernetes cluster? If you do - how did you set things up and how well does it work?

big-carpet-38439

06/08/2022, 2:15 PM

Hi Alan! I'm not aware of folks doing this yet, but that's an exciting thing - you're breaking ground! We are happy to provide support / answer questions as required. Typically I've seen Kafka Rest Proxy used in cases where one needs to front a kafka cluster with a more accessible rest interface

wonderful-dream-38059

06/09/2022, 9:33 AM

Thanks @big-carpet-38439 - if I were to do that, does the kafka sink support that kind of setup? If it does do you have an outline of the configuration you'd expect as part of a recipe to make that work? I'm not expecting everything to work first time, but if you've got an idea of a good place to start that would make any trial-and-error a little more efficient. I'm still new to the integration setup so lots I might get wrong as an assumption

big-carpet-38439

06/09/2022, 3:06 PM

Unfortunately I think we'd need to build a new Ingestion sink to point towards a proxy - this is doable but we don't currently have it prioritized

wonderful-dream-38059

06/09/2022, 3:06 PM

understood.

big-carpet-38439

06/09/2022, 3:06 PM

Have you attempted to scale out your existing instance? What type of volume do you have coming at the service?

wonderful-dream-38059

06/09/2022, 3:07 PM

The tableau source is pushing the rest endpoint hard enough that one or the other either maxes out it's memory or times out.

wonderful-dream-38059

06/09/2022, 3:07 PM

We have a large tableau instance, and it's not a stateful sync so we're firing a lot of data at the rest api.

wonderful-dream-38059

06/09/2022, 3:08 PM

I take your point that horizontal scaling of the rest endpoint might be wise though. Would I just scale the GMS container?

big-carpet-38439

06/09/2022, 3:17 PM

Yeah typically we recommend this: • Scale out GMS pods (say to 3) • Extract the Metadata Change Consumer Job (standalone consumers) into a separate pod (possible easily via the Helm charts)

wonderful-dream-38059

06/09/2022, 3:18 PM

I'm not sure I know what you mean by the latter. Have you got a link to the docs (or code) for the thing you're talking about?

big-carpet-38439

06/09/2022, 3:34 PM

https://datahubproject.io/docs/metadata-ingestion/schedule_docs/intro

big-carpet-38439

06/09/2022, 3:35 PM

So in helm - you can simply flip this value, and it should deploy a standalone consumer pod

big-carpet-38439

06/09/2022, 3:35 PM

Instead of as part of GMS

wonderful-dream-38059

06/09/2022, 3:35 PM

ahhh - yes I think I've effecitely already done that by offloading the ingestion tasks off to airflow

big-carpet-38439

06/09/2022, 3:36 PM

This will deploy the consumer of message which keeps our indexes updated

big-carpet-38439

06/09/2022, 3:36 PM

Somewhat tangential to ingestion

wonderful-dream-38059

06/09/2022, 3:40 PM

oh I get you 👍

wonderful-dream-38059

06/09/2022, 3:43 PM

@big-carpet-38439 - do I understand correctly that you mean this value in the helm chart: https://github.com/acryldata/datahub-helm/blob/master/charts/datahub/values.yaml#L41 the one for

datahub-ingestion-cron

big-carpet-38439

06/09/2022, 3:43 PM

the one for standalone consumers enabled

big-carpet-38439

06/09/2022, 3:44 PM

l101

wonderful-dream-38059

06/09/2022, 3:44 PM

gotcha

big-carpet-38439

06/09/2022, 3:44 PM

this will effectively split the consumer we need in datahub into separate k8s deployment

wonderful-dream-38059

06/09/2022, 3:44 PM

I'll enable that and see if it helps. Otherwise I'll focus on the tableau source. Thanks for being so responsive!

big-carpet-38439

06/09/2022, 3:45 PM

Okay wonderful! Definitely want to make sure you can scale this thing - we've actually had to this ourselves quite a bit so we can find some more pointers if you need

Open in Slack

Previous Next