Hello - im just getting setup and would like to ge...
# getting-started
s
Hello - im just getting setup and would like to get some advice... I have added Pinot via Helm on EKS and i will want to (a) ingest parquet from S3 and (b) stream data from MSK(Kafka 2.2.1 on the same VPC) and probably requiring SSL 1. I added a simple schema and table spec - they look ok 2. I (think) i configure deep storage for S3 3. Batch ingestion - Really i am interested in any recommended (standalone) way to ingest data from S3 (watching folders on some interval) and im not sure if the docs are providing what i need? Is there an example of posting directly to the controller? 4. I would like to try with Kafka too in this case im just wondering about the configs (in thread). I am feeling my way though and in this case if anyone has a sample config that would be nice to see for this MSK setup but i expect ill get there with some trial and error a. I was a little put off but the mention of needing to update the pom for Kafka version 2.2.1 and i was not really sure if that was indeed needed or how i would do that via Helm
also are you building the docker image from source code ? if not then latest image should just pack kafka 2.0
s
Thanks for this - in fact i read through this documentation but given that i already had MSK configured i got a little stuck with my particular setup. My setup is to have Pinot installed on EKS managed with ArgoCD/Heml on EKS and MKS on the same VPC. So i am using the local Pinot CLI to submit configs etc. Currently I can create/submit the table spec and see the table created but the table has a "Bad" status. I was little unsure if i was able to connect to Kafka at all and wondered if SSL or something like that was causing issues. Here is my config
Copy code
"tableIndexConfig": {
        "loadMode": "MMAP",
        "streamConfigs": {
            "streamType": "kafka",
            "security.protocol": "SSL",
            "stream.kafka.topic.name": "TOPIC            
            "stream.kafka.consumer.type": "lowlevel",
            "stream.kafka.consumer.prop.auto.offset.reset": "largest",
            "stream.kafka.consumer.factory.class.name": "org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
            "stream.kafka.decoder.class.name": "org.apache.pinot.plugin.inputformat.avro.confluent.KafkaConfluentSchemaRegistryAvroMessageDecoder",
            "realtime.segment.flush.threshold.rows": "0",
            "realtime.segment.flush.threshold.time": "24h",
            "realtime.segment.flush.segment.size": "100M",
            "stream.kafka.zk.broker.url": "<http://z-1.SERVER.com:2181|z-1.SERVER.com:2181>",
            "stream.kafka.broker.list": "<http://b-3.SERVER.com:9094,b-2.SERVER.com:9094,b-1.SERVER.com:9094|b-3.SERVER.com:9094,b-2.SERVER.com:9094,b-1.SERVER.com:9094>",
            "schema.registry.url": "kafka-schema-registry-cp-schema-registry.kafka-schema-registry.svc.cluster.local:8081"
        }
I am not sure how to troubleshoot as i do not see anything in thr controller logs that is an obvious problem
x
I think the issue is pinot cannot connect to msk
can you try to enter pinot container and see if you can ping to that host
kubectl exec -it pod/pinot-controller-0 -n pinot -- bash
s
ok thanks. i will take a look. but i am pretty confident if its on that cluster it will see msk as i have a great many K8 apps on that cluster talking to Kafka. I suspect my configuration for SSL is wrong as i know when i set up Druid i needed to specify an SSL cert location and other things to make that work so ill try to do something smilar here - i was just hoping that if there was a ready made config for that case i could try it
x
ok, then you can try out the SSL first
s
I wonder can someone help me with this - i tried to set some SSL settings using the examples in docs but i cannot understand how this is supposed to work (and i guess its based on my lack of understanding of certs etc independent to pinot)
Copy code
"security.protocol": "SSL",
  "ssl.truststore.location": "/opt/pinot/kafka.client.truststore.jks",
If i do something like this then this truststore will be a file that is not found but how do i know what this path should be or what setup steps i need to take. My naive understanding is that for the pinot controller or server or whatever should have a place that determines the trust store location. As i see it i deployed pinot via helm. As the person who deployed that i guess i need to configure where to look but as a client submitting a table spec i should not need to know. Confused.