Hi did anyone manage to resolve the AWS MSK managed Kafka ti DataHub #ingestion

Join Slack

Hi, did anyone manage to resolve the AWS MSK manag...

# ingestion

blue-holiday-20644

08/25/2021, 3:01 PM

Hi, did anyone manage to resolve the AWS MSK managed Kafka timeout issue when ingesting? I'm getting the same error.

Untitled

mammoth-bear-12532

08/25/2021, 6:29 PM

@blue-holiday-20644: this one is still open... we haven't gotten to the bottom of it yet.

early-lamp-41924

08/25/2021, 9:54 PM

cc @curved-jordan-15657 @adventurous-scooter-52064 After experimentation, we realized that replication factor was the issue. If the replication factor for the MCE topic is 1, it times out while if we set it to 2 or more it doesn’t. Created a quick PR to change replication factor during kafka-setup, but modifying it for an existing topic is not trivial. https://docs.confluent.io/platform/current/kafka/post-deployment.html#increasing-replication-factor

early-lamp-41924

08/25/2021, 9:54 PM

You can also try deleting the topics and rerunning kafka-setup with replication set to a higher number

early-lamp-41924

08/25/2021, 9:54 PM

FYI here is the helm PR https://github.com/acryldata/datahub-helm/pull/23

curved-jordan-15657

08/26/2021, 7:41 AM

Hi @early-lamp-41924! I see, but according to the AWS docs (https://docs.aws.amazon.com/msk/latest/developerguide/msk-default-configuration.html), the default.replication.factor property is “3 for 3-AZ clusters, 2 for 2-AZ clusters”. In my case, i have 3 brokers and 3 A-Z. So that means, my default.replication.factor=3. I didn’t typed this in configuration file but i use default msk configuration. But you were talking about MCE topic… Do i need to do it manually for a specific topic?

blue-holiday-20644

08/26/2021, 8:52 AM

Thanks for the update- I'm running a 2-zone MSK with default replication config but I'll look into what I can adjust in that area.

blue-holiday-20644

08/26/2021, 8:53 AM

Also- is it possible to configure the AWS Glue schema registry in the dockerised Datahub similar to the helm version?

blue-holiday-20644

08/26/2021, 12:30 PM

I managed to get my 2-node MSK cluster running with Docker Datahub with these MSK settings. Setting them to 2 didn't resolve the timeout. I also had my kafka-setup.sh set to : ${PARTITIONS:=2} and : ${REPLICATION_FACTOR:=2} which I might need to tune down again.

Untitled

blue-holiday-20644

08/26/2021, 12:32 PM

I'll have to see if I can increase replication for production, but just getting ingests to run is progress.

mammoth-bear-12532

08/26/2021, 1:58 PM

@blue-holiday-20644 : I’m not able to fully follow the end state. Were you able to run ingestion successfully? Were you able to increase replication factor to a number greater than 1?

mammoth-bear-12532

08/26/2021, 2:01 PM

@curved-jordan-15657 : yes you have to set this manually for the existing kafka topics that are set up by the setup script. Since they have already been setup with the replication factor :1. It seems like there is some interaction between that setting and the client not being able to produce to these topics.

blue-holiday-20644

08/26/2021, 2:05 PM

@mammoth-bear-12532 Yes I managed to get ingestion recipes using the kafka sink to run without the timeout issue. I initially tried setting my replication settings to 2 in the MSK configs which didn't resolve the timeout. For a 2-node MSK topology it seems to work by setting them to 1 as above- maybe this is an N-1 situation for replicating across N nodes?

blue-holiday-20644

08/26/2021, 2:06 PM

MSK configurations seem to be static and not derived from the size of your cluster, so you have to create and apply specific configs to change the default values which may resolve issues for cluster sizes other than the default of 3 nodes.

mammoth-bear-12532

08/26/2021, 2:35 PM

Oh interesting. For our three node cluster, replication factor 2 and 3 both seem to work.

blue-holiday-20644

08/26/2021, 3:06 PM

I changed it, cleared out all the topics and it still gave me the timeout.

early-lamp-41924

08/26/2021, 4:29 PM

On the Glue side, yes you can, but you can’t use it for kafka-ingestion unfortunately 😞 it doesn’t support python kafka apis

early-lamp-41924

08/26/2021, 4:30 PM

Did you delete the topic and recreate it with the new replication factor?

blue-holiday-20644

08/26/2021, 4:31 PM

I manually deleted the topics and the kafka-setup.sh recreated them with 2 partitions and replication. Just testing with them reverted back to 1 each.

early-lamp-41924

08/26/2021, 4:38 PM

So that needs to be 2 or more in a regular MSK setup

blue-holiday-20644

08/26/2021, 4:38 PM

: ${PARTITIONS:=1} : ${REPLICATION_FACTOR:=1}- just tested with these original kafka-setup settings and it still worked. So I guess my MSK configs were causing the issues...?

early-lamp-41924

08/26/2021, 4:39 PM

Interesting hmn

blue-holiday-20644

08/26/2021, 4:39 PM

Untitled

blue-holiday-20644

08/26/2021, 4:40 PM

My 2-node MSK config looks like this currently

blue-holiday-20644

08/26/2021, 4:40 PM

I wonder if min.insync.replicas set to 2 would never succeed for a 2-node setup?

2 Views

Open in Slack

Previous Next